Experiment data.
Facial expression recognition (FER) is significantly influenced by the cultural background (CB) of observers and the masking conditions of the target face. This study aimed to clarify these factors’ impact on FER, particularly in machine-learning datasets, increasingly used in human-computer interaction and automated systems. We conducted an FER experiment with East Asian participants and compared the results with the FERPlus dataset, evaluated by Western raters. Our novel analysis approach focused on variability between images and participants within a "majority" category and the eye-opening rate of target faces, providing a deeper understanding of FER processes. Notable findings were differences in "fear" perception between East Asians and Westerners, with East Asians more likely to interpret "fear" as "surprise." Masking conditions significantly affected emotion categorization, with "fear" perceived by East Asians for non-masked faces interpreted as "surprise" for masked faces. Then, the emotion labels were perceived as different emotions across categories in the masking condition, rather than simply lower recognition rates or confusion as in existing studies. Additionally, "sadness" perceived by Westerners was often interpreted as "disgust" by East Asians. These results suggest that one-to-one network learning models, commonly trained using majority labels, might overlook important minority response information, potentially leading to biases in automated FER systems. In conclusion, FER dataset characteristics differ depending on the target face’s masking condition and the diversity among evaluation groups. This study highlights the need to consider these factors in machine-learning-based FER that relies on human-judged labels, to contribute to the development of more nuanced and fair automated FER systems. Our findings emphasize the novelty of our approach compared to existing studies and the importance of incorporating a broader range of human variability in FER research, setting the stage for future evaluations of machine learning classifiers on similar data.