Enhancing Multimodal Emotion Analysis through Fusion with EMT Model Based on BBFN

Wenshuo Wang

doi:10.62051/gfeqm854

Authors

Wenshuo Wang

DOI:

https://doi.org/10.62051/gfeqm854

Keywords:

Multimodal sentiment analysis; multimodal fusion; cross-modal processing.

Abstract

Sentiment analysis, as one of the key technologies of natural language processing, has been widely used in medical, film and television fields. In order to increase sentiment analysis's precision, it is particularly important to integrate multi-modal data. This paper presents a pioneering fusion strategy that amalgamates the cutting-edge Efficient Multimodal Transformer (EMT) model with the innovative Bi-Bimodal Fusion Network (BBFN) to revolutionize emotion analysis. By synergistically integrating these two state-of-the-art models, the research endeavors to enhance the efficiency and precision of sentiment analysis in multimodal datasets by accentuating the intricate interplay of global-local cross-modal interactions. Through a rigorous process of meticulous experimentation and comprehensive analysis conducted on the challenging MOSI dataset, the integrated model unveils a plethora of groundbreaking advancements across pivotal metrics, including accuracy, correlation coefficient, and Mean Absolute Error (MAE). The innovative integration surpasses existing models and sets a new paradigm for multimodal sentiment analysis frameworks, highlighting the importance of holistic modal fusion in understanding human emotions.

Downloads

Download data is not yet available.

References

Tadas Baltrušaitis, Chaitanya Ahuja, Louis-Philippe Morency. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41: 423-443.

Wei Han, Hui Chen, Alexander Gelbukh, et al. Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis. ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction, 2021.

Md Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, et al. Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, 1: 370–379.

Wasifur Rahman, Md. Kamrul Hasan, Sangwu Lee, et al. Integrating multimodal information in large pretrained transformers. In Proceedings of the conference Association for Computational Linguistics Meeting, 2020, NIH Public Access: 2359.

Hai Pham, Paul Pu Liang, Thomas Manzini, et al. Found in translation: Learning robust joint representations by cyclic translations between modalities. In Proceedings of the AAAI Conference on Artificial Intelligence,2019, 33: 6892–6899.

Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao. Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis. IEEE Transactions on Affective Computing, 2023, 1-17.

Amir Zadeh, Rowan Zellers, Eli Pincus, Louis-Philippe Morency. Morency Multimodal sentiment intensity analysis in videos: Facial gestures and verbalmessages. IEEE Intelligent Systems, 2016, 31(6): 82–88.

Feiyang Chen, Ziqian Luo, Yanyan Xu, Dengfeng Ke. Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis. arXiv:1904.08138, 2019.

Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, et al. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, 1: 2247–2256.

Amir Zadeh, Minghai Chen, Soujanya Poria, et al. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, 1103–1114.

Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, et al. Learning Factorized Multimodal Representations. In International Conference on Representation Learning, 2019, 6558–6569.

Zhongkai Sun, Prathusha Sarma, William Sethares, Yingyu Liang. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34: 8992–8999.

Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019.

Devamanyu Hazarika, Roger Zimmermann, Soujanya Poria. MISA: Modality-Invariant and-Specific Representations for Multimodal Sentiment Analysis. In Proceedings of the 28th ACM International Conference on Multimedia, 2020, 1122–1131.

Wenmeng Yu, Hua Xu, Ziqi Yuan, Jiele Wu. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35: 10 790–10 797.

Wei Han, Hui Chen, Soujanya Poria. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, 9180–9192.

Ya Sun, Sijie Mai, Haifeng Hu. Learning to learn better unimodal representations via adaptive multimodal meta-learning. IEEE Transactions on Affective Computing, 2022.

Ziqi Yuan, Wei Li, Hua Xu, Wenmeng Yu. Transformer-based feature reconstruction network for robust multimodal sentiment analysis. In Proceedings of the 29th ACM International Conference on Multimedia, 2021, 4400–4407.