A1 Refereed original research article in a scientific journal
Testing human-hand segmentation on in-distribution and out-of-distribution data in human–robot interactions using a deep ensemble model
Authors: Jalayer, Reza; Chen, Yuxin; Jalayer, Masoud; Orsenigo, Carlotta; Tomizuka, Masayoshi
Publisher: Elsevier BV
Publication year: 2025
Journal: Mechatronics
Journal name in source: Mechatronics
Article number: 103365
Volume: 110
ISSN: 0957-4158
DOI: https://doi.org/10.1016/j.mechatronics.2025.103365
Web address : https://www.sciencedirect.com/science/article/pii/S0957415825000741
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/499135968
Reliable detection and segmentation of human hands are critical for enhancing safety and facilitating advanced interactions in human–robot collaboration. Current research predominantly evaluates hand segmentation under in-distribution (ID) data, which reflects the training data of deep learning (DL) models. However, this approach fails to address out-of-distribution (OOD) scenarios that often arise in real-world human–robot interactions. In this work, we make three key contributions: first we assess the generalization of deep learning (DL) models for hand segmentation under both ID and OOD scenarios, utilizing a newly collected industrial dataset that captures a wide range of real-world conditions including simple and cluttered backgrounds with industrial tools, varying numbers of hands (0 to 4), gloves, rare gestures, and motion blur. Our second contribution is considering both egocentric and static viewpoints. We evaluated the models trained on four datasets, i.e. EgoHands, Ego2Hands (egocentric mobile camera), HADR, and HAGS (static fixed viewpoint) by testing them with both egocentric (head-mounted) and static cameras, enabling robustness evaluation from multiple points of view. Our third contribution is introducing an uncertainty analysis pipeline based on the predictive entropy of predicted hand pixels. This procedure enables flagging unreliable segmentation outputs by applying thresholds established in the validation phase. This enables automatic identification and filtering of untrustworthy predictions, significantly improving segmentation reliability in OOD scenarios. For segmentation, we used a deep ensemble model composed of UNet and RefineNet as base learners. Our experiments demonstrate that models trained on industrial datasets (HADR, HAGS) outperform those trained on non-industrial datasets, both in segmentation accuracy and in their ability to flag unreliable outputs via uncertainty estimation. These findings underscore the necessity of domain-specific training data and show that our uncertainty analysis pipeline can provide a practical safety layer for real-world deployment.
Downloadable publication This is an electronic reprint of the original article. |