Unveiling the human-like similarities of automatic facial expression recognition: An empirical exploration through explainable AI

Show simple item record

dc.contributor.author F. Xavier Gaya-Morey
dc.contributor.author Silvia Ramis-Guarinos
dc.contributor.author Manresa-Yee, C.
dc.contributor.author Buades-Rubio, José M.
dc.date.accessioned 2025-01-30T08:42:30Z
dc.date.available 2025-01-30T08:42:30Z
dc.identifier.citation Gaya-Morey, F. X., Ramis-Guarinos, S., Manresa-Yee, C., i Buades-Rubio, J. M. (2024). Unveiling the human-like similarities of automatic facial expression recognition: An empirical exploration through explainable ai. Multimedia Tools and Applications, 83(38), 85725-85753.https://doi.org/https://doi.org/10.1007/s11042-024-20090-5
dc.identifier.uri http://hdl.handle.net/11201/168257
dc.description.abstract [eng] Facial expression recognition is vital for human behavior analysis, and deep learning has enabled models that can outperform humans. However, it is unclear how closely they mimic human processing. This study aims to explore the similarity between deep neural networks and human perception by comparing twelve different networks, including both general object classifiers and FER-specific models. We employ an innovative global explainable AI method to generate heatmaps, revealing crucial facial regions for the twelve networks trained on six facial expressions. We assess these results both quantitatively and qualitatively, comparing them to ground truth masks based on Friesen and Ekman’s description and among them. We use Intersection over Union (IoU) and normalized correlation coefficients for comparisons. We generate 72 heatmaps to highlight critical regions for each expression and architecture. Qualitatively, models with pre-trained weights show more similarity in heatmaps compared to those without pre-training. Specifically, eye and nose areas influence certain facial expressions, while the mouth is consistently important across all models and expressions. Quantitatively, we find low average IoU values (avg. 0.2702) across all expressions and architectures. The best-performing architecture averages 0.3269, while the worst-performing one averages 0.2066. Dendrograms, built with the normalized correlation coefficient, reveal two main clusters for most expressions: models with pre-training and models without pre-training. Findings suggest limited alignment between human and AI facial expression recognition, with network architectures influencing the similarity, as similar architectures prioritize similar facial regions.
dc.format application/pdf
dc.relation.ispartof Multimedia Tools and Applications, 2024, vol. 83, p. 85725–85753
dc.rights Attribution 4.0 International
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.subject.classification 004 - Informàtica
dc.subject.other 004 - Computer Science and Technology. Computing. Data processing
dc.title Unveiling the human-like similarities of automatic facial expression recognition: An empirical exploration through explainable AI
dc.type info:eu-repo/semantics/article
dc.type info:eu-repo/semantics/publishedVersion
dc.date.updated 2025-01-30T08:42:30Z
dc.subject.keywords Cognitive Anthropomorphism
dc.subject.keywords Human-like
dc.subject.keywords Deep Learning
dc.subject.keywords Empirical Evaluation
dc.subject.keywords explainable artificial intelligence
dc.subject.keywords Facial Expression Recognition
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.identifier.doi https://doi.org/https://doi.org/10.1007/s11042-024-20090-5


Files in this item

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International Except where otherwise noted, this item's license is described as Attribution 4.0 International

Search Repository


Advanced Search

Browse

My Account

Statistics