Research on Video Emotion Recognition Method Based on Deep Learning

Hui Wang

doi:10.62051/ijcsit.v5n2.12

Authors

Hui Wang

DOI:

https://doi.org/10.62051/ijcsit.v5n2.12

Keywords:

Video Emotion Recognition Method, Deep Learning, P3D

Abstract

Video data has significant temporal and spatial characteristics, and the extraction and recognition of its emotional features is currently a research hotspot in the field of human-computer interaction. This paper proposes an improved model based on P3D 3D residual network (3Res att Network) to address the difficulties in dynamic feature extraction and insufficient spatiotemporal information fusion in video emotion recognition tasks. Firstly, the P3D infrastructure is constructed by decoupling spatiotemporal convolution kernels, effectively reducing model complexity; Secondly, design a 3D Spatial Attention mechanism to dynamically focus on emotionally significant areas and enhance feature discriminability; Finally, by combining residual connections with multi head attention mechanism, an end-to-end 3Res att network is constructed to achieve deep fusion of spatiotemporal features. Experiments on the eNTERFACE '05 dataset show that our method achieves an accuracy of 67.5% in video emotion recognition tasks, which is 18.2% higher than traditional C3D models and 11.4% higher than 3Res att2. The ablation experiments further validated the effectiveness of the attention module, providing a new technological path for video emotion computing.

Downloads

Download data is not yet available.

References

[1] CHENG S,ZHOU G H.Facial expression recognition method based on improved VGG convolutional neural netw ork[J].International Journal of Pattern Recognition and Artificial Intelligence,2020,34(7):2056003.

[2] Lu Guanming, Zhu Hairui, Hao Qiang, etc Facial expression recognition based on deep residual network [J]. Data acquisition and processing,2019,34(1):50-57.

[3] ZHONG Y,QIU S,LUO X,et al.Facial expression recognition based on optimized ResNet[C]//Proceedings of 2020 2ndWorld Symposium on Artificial Intelligence(WSAI).IEEE,2020:84-91.

[4] SHAHZAD T,IQBAL K,KHAN M A,et al.Role of zoning in facial expression using deep learning[J].IEEE Access,2023,11:16493-16508.

[5] MARRERO F P D,GUERRERO P F A,REN T,et al.Feratt:Facial expression recognition w ith attention net[J].arXiv preprint arXiv:1902.03284,2019.

[6] WANG K,PENG X,YANG J,et al.Region attention networks for pose and occlusion robust facial expression recognition[J].IEEE Transactions on Image Processing,2020,29:4057-4069.

[7] GAN Y,CHEN J,YANG Z,et al.Multiple attention network for facial expression recognition[J].IEEE Access, 2020, 8:7383-7393.

[8] Lan Lingqiang, Liu Qiyuan, Lu Shuhua Facial expression recognition based on attention mechanism and feature correlation [J]. Journal of Beihang University, 2022, 48 (1): 147-155.

[9] VATS A,CHADHA A.Facial emotion recognition[J].arXivpreprint arXiv:2301. 10906, 2023.