A Comparative Study of Deep Learning-based RGB-depth Fusion Methods for Object Detection

: Farahnakian Fahimeh, Heikkonen Jukka

: M. Arif Wani, Feng Luo, Xiaolin (Andy) Li, Dejing Dou, Francesco Bonchi

: International Conference on Machine Learning and Applications

: 2021

: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)

: 1475

: 1482

: 978-1-7281-8471-5

: 978-1-7281-8470-8

DOI: https://doi.org/10.1109/ICMLA51294.2020.00228

The object detection task which attempts to predict bounding boxes of all interest objects in an RGB image is of paramount importance for many real-world applications and has attracted much attention within the computer vision com- munity. However, RBG cameras cannot directly provide depth information and RGB-based object detector can not achieve an accurate performance under complex environment. To address this problem, we make two contributions in this paper. Firstly, the performances of four state-of-the art unsupervised depth estimation methods were thoroughly evaluated in the context of object detection, which can serve as a baseline for other researchers to develop even more sophisticated methods. Sec- ondly, we investigated whether fusing depth information and RGB can improve the performance of object detection networks. The obtained results on the KITTI dataset show that RGB-depth fusion approach with MonoDepth as depth estimation method outperforms the RGB-based and depth-based detectors.