Unveiling the human-like similarities of automatic facial expression recognition: An empirical exploration through explainable AI

F. Xavier Gaya-Morey; Silvia Ramis-Guarinos; Manresa-Yee, C.; Buades-Rubio, José M.

dc.contributor.author	F. Xavier Gaya-Morey
dc.contributor.author	Silvia Ramis-Guarinos
dc.contributor.author	Manresa-Yee, C.
dc.contributor.author	Buades-Rubio, José M.
dc.date.accessioned	2025-01-30T08:42:30Z
dc.date.available	2025-01-30T08:42:30Z
dc.identifier.citation	Gaya-Morey, F. X., Ramis-Guarinos, S., Manresa-Yee, C., i Buades-Rubio, J. M. (2024). Unveiling the human-like similarities of automatic facial expression recognition: An empirical exploration through explainable ai. Multimedia Tools and Applications, 83(38), 85725-85753.https://doi.org/https://doi.org/10.1007/s11042-024-20090-5
dc.identifier.uri	http://hdl.handle.net/11201/168257
dc.description.abstract	[eng] Facial expression recognition is vital for human behavior analysis, and deep learning has enabled models that can outperform humans. However, it is unclear how closely they mimic human processing. This study aims to explore the similarity between deep neural networks and human perception by comparing twelve different networks, including both general object classifiers and FER-specific models. We employ an innovative global explainable AI method to generate heatmaps, revealing crucial facial regions for the twelve networks trained on six facial expressions. We assess these results both quantitatively and qualitatively, comparing them to ground truth masks based on Friesen and Ekman’s description and among them. We use Intersection over Union (IoU) and normalized correlation coefficients for comparisons. We generate 72 heatmaps to highlight critical regions for each expression and architecture. Qualitatively, models with pre-trained weights show more similarity in heatmaps compared to those without pre-training. Specifically, eye and nose areas influence certain facial expressions, while the mouth is consistently important across all models and expressions. Quantitatively, we find low average IoU values (avg. 0.2702) across all expressions and architectures. The best-performing architecture averages 0.3269, while the worst-performing one averages 0.2066. Dendrograms, built with the normalized correlation coefficient, reveal two main clusters for most expressions: models with pre-training and models without pre-training. Findings suggest limited alignment between human and AI facial expression recognition, with network architectures influencing the similarity, as similar architectures prioritize similar facial regions.
dc.format	application/pdf
dc.relation.ispartof	Multimedia Tools and Applications, 2024, vol. 83, p. 85725–85753
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject.classification	004 - Informàtica
dc.subject.other	004 - Computer Science and Technology. Computing. Data processing
dc.title	Unveiling the human-like similarities of automatic facial expression recognition: An empirical exploration through explainable AI
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/publishedVersion
dc.date.updated	2025-01-30T08:42:30Z
dc.subject.keywords	Cognitive Anthropomorphism
dc.subject.keywords	Human-like
dc.subject.keywords	Deep Learning
dc.subject.keywords	Empirical Evaluation
dc.subject.keywords	explainable artificial intelligence
dc.subject.keywords	Facial Expression Recognition
dc.rights.accessRights	info:eu-repo/semantics/openAccess
dc.identifier.doi	https://doi.org/https://doi.org/10.1007/s11042-024-20090-5