A1 Refereed original research article in a scientific journal
A generative multimodal network for facial expression recognition
Authors: Zhao, Yue; Song, Mingjian; Zhang, Qi; Yang, Jiawei; Yoshigoe, Kenji; Tian, Chunwei
Publisher: Elsevier
Publication year: 2026
Journal: Pattern Recognition
Article number: 113518
Volume: 179
Issue: Part A
ISSN: 0031-3203
eISSN: 1873-5142
DOI: https://doi.org/10.1016/j.patcog.2026.113518
Publication's open availability at the time of reporting: No Open Access
Publication channel's open availability : Partially Open Access publication channel
Web address : https://doi.org/10.1016/j.patcog.2026.113518
Deep networks with strong feature extraction abilities have been extensively employed in facial expression recognition (FER). However, they focus on structural information from data dependency rather than facial attribute to limit robustness of obtained models for FER. In this paper, we propose a generative multimodal network (GMNet) for FER. Firstly, GMNet can generate and align multimodal face images, according to face asymmetry and mirror imaging principle. Secondly, it utilizes parallel networks to respectively learn diversity information based on original and generative multimodal face images and merge them from obtained multimodal face images to obtain reliable facial expression information. Thirdly, a sparse mechanism can further refine obtained richer facial features above to obtain more accurate facial expression information and reduce training costs. Finally, a cross loss can utilize cross domain restriction to guarantee reliability of multimodal face images to improve performance in facial expression. Experimental results show that our GMNet is superior to other popular FER methods. Codes of GMNet can be used at https://github.com/hellloxiaotian/GMNet.
Funding information in the publication:
This work was supported by Leading Talents in Gusu Innovation and Entrepreneurship [No. ZXL2023170]; and the Basic Research Programs of Taicang 2024 [No. TC2024JC32].