Multimodal Emotion Recognition Based on Deep Learning

Authors

  • Jiali Jiang

DOI:

https://doi.org/10.62051/ijcsit.v5n2.10

Keywords:

Emotional Recognition, Multitasking, Multi-head Attention Mechanism, Neural Networks

Abstract

In recent years, multitask learning-based joint analysis of multiple emotions has emerged as a significant research topic in natural language processing and artificial intelligence. This approach aims to identify multiple emotion categories expressed in discourse by integrating multimodal information and leveraging shared knowledge across related tasks. Sentiment analysis, emotion recognition, and sarcasm detection constitute three closely interconnected tasks in affective computing. This paper focuses on these three tasks - sentiment analysis, emotion recognition, and sarcasm detection - while addressing current challenges in their research. The specific work includes the following three aspects: (1) Due to the limitations of datasets in the development of current Chinese multi-task learning models, this paper establishes a Chinese multi-task multi-modal dialogue emotion corpus to support the development of multi-task multi-modal sentiment analysis. The dataset is annotated with multiple task labels (such as sentiment, emotion, sarcasm, humor, etc.) and, for the first time, manually annotates the correlation between sentiment and emotion, as well as sarcasm and humor. Through scientific evaluation and analysis, it is demonstrated that the dataset possesses high quality and representativeness. (2) Based on the constructed dataset, this paper primarily considers three aspects: context interaction, multi-modal feature fusion, and multi-task learning, and proposes a multi-modal sentiment analysis model based on multi-task learning. Through experimental evaluation, the effectiveness of the model is demonstrated. (3) In response to the model proposed in (2), which fails to consider the interrelationships between tasks, this paper presents a multi-task learning model based on soft parameter sharing to learn the commonalities and differences between different tasks. Experimental results comparing with other advanced baselines demonstrate the innovation and efficiency of the proposed method.

Downloads

Download data is not yet available.

References

[1] Middya A I, Nag B, Roy S. Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities[J]. Knowledge-Based Systems, 2022, 244: 108580.

[2] Morency L-P, Mihalcea R, Doshi P. Towards multimodal sentiment analysis: Harvesting opinions from the web[C]// Proceedings of the 13th international conference on multimodal interfaces, 2011: 169-176.

[3] Pérez-Rosas V, Mihalcea R, Morency L P. Utterance-Level Multimodal Sentiment Analysis[C]// Association for Computational Linguistics. ACL, 2013.

[4] Ghosal D, Akhtar M S, Chauhan D, et al. Contextual inter-modal attention for multi- modal sentiment analysis[C]// Proceedings of the 2018 conference on empirical methods in natural language processing, 2018: 3454-3466.

[5] Kumar A, Vepa J. Gated mechanism for attention based multi modal sentiment analysis[C]// ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020: 4477-4481.

[6] Wen H, You S, Fu Y. Cross-modal context-gated convolution for multi-modal sentiment analysis[J]. Pattern Recognition Letters, 2021, 146: 252-259.

[7] Zhang Y, Song D, Zhang P, et al. A Quantum-Inspired Multimodal Sentiment Analysis Framework[J]. Theoretical Computer ence, 2018: 21-40.

[8] Xu N, Mao W, Chen G. Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis[C]// The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), 2019.

[9] Chuang Z-J, Wu C-H. Multi-modal emotion recognition from speech and text[C]// International Journal of Computational Linguistics & Chinese Language Processing, Volume 9, Number 2, August 2004: Special Issue on New Trends of Speech and Language Processing, 2004: 45-62.

[10] Datcu D, Rothkrantz L J. Semantic audiovisual data fusion for automatic emotion recognition[J]. Emotion recognition: a pattern analysis approach, 2015: 411-435.

[11] Poria S, Hazarika D, Majumder N, et al. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations[J], 2018.

[12] Majumder N, Poria S, Hazarika D, et al. DialogueRNN: An Attentive RNN for Emotion Detection in Conversations [C] // 2019: 6818-6825.

[13] Zhang Y, Li Q, Song D, et al. Quantum-Inspired Interactive Networks for Conversational Sentiment Analysis[C]// Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}, 2019.

Downloads

Published

28-02-2025

Issue

Section

Articles

How to Cite

Jiang, J. (2025). Multimodal Emotion Recognition Based on Deep Learning. International Journal of Computer Science and Information Technology, 5(2), 71-80. https://doi.org/10.62051/ijcsit.v5n2.10