Multimodal Data-Based Text Generation Depression Classification Model

Shukui Ma; Pengyuan Ma; Shuaichao Feng; Fei Ma; Guangping Zhuo

doi:10.62051/ijcsit.v5n1.16

Authors

Shukui Ma
Pengyuan Ma
Shuaichao Feng
Fei Ma
Guangping Zhuo

DOI:

https://doi.org/10.62051/ijcsit.v5n1.16

Keywords:

Depression Classification, Multimodal, Dual Text Contrastive Learning Module, Joint Multi-modal Fusion Attention

Abstract

Depression classification often relies on multimodal features, but existing models struggle to capture the similarity between multimodal features. Moreover, the social stigma surrounding depression leads to limited availability of datasets, which constrains model accuracy. This study aims to improve multimodal depression recognition methods by proposing a Multimodal Generation-Text Depression Classification Model. The model introduces a Multimodal-Deep-Extract-Feature Net to capture both long- and short-term sequential features. A Dual Text Contrastive Learning Module is employed to generate emotionally salient word embeddings from patients' transcribed text. Contrastive learning brings similar features closer and pushes dissimilar features apart, thereby enhancing the representation of dual-text features. Finally, a Joint Multi-modal Fusion Attention mechanism is proposed to amplify the representation of dominant modalities, effectively integrate all modalities, and capture global multimodal features. This integrated approach improves depression recognition accuracy, facilitating timely intervention and support for patients. The model achieves accuracy rates of 89.5% on the DAIC-Woz dataset and 92% on the MDD2024 dataset.

Downloads

Download data is not yet available.

References

[1] WOODY C, FERRARI A, SISKIND D, et al. A systematic review and meta-regression of the prevalence and incidence of perinatal depression [J]. 2017, 219: 86-92.

[2] SANTOMAURO D F, HERRERA A M M, SHADID J, et al. Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the COVID-19 pandemic [J]. 2021, 398(10312): 1700-12.

[3] CD M J P M. Projections of global mortality and burden of disease from 2002 to 2030 [J]. 2006, 3: 2011-30.

[4] EVANS-LACKO S, AGUILAR-GAXIOLA S, AL-HAMZAWI A, et al. Socio-economic variations in the mental health treatment gap for people with anxiety, mood, and substance use disorders: results from the WHO World Mental Health (WMH) surveys [J]. 2018, 48(9): 1560-71.

[5] WANG Q, YANG H, YU Y J J O V C, et al. Facial expression video analysis for depression detection in Chinese patients [J]. 2018, 57: 228-33.

[6] SENN S, TLACHAC M, FLORES R, et al. Ensembles of bert for depression classification; proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), F, 2022 [C]. IEEE.

[7] FLORES R, TLACHAC M, TOTO E, et al. Transfer learning for depression screening from follow-up clinical interview questions [M]. Deep Learning Applications, Volume 4. Springer. 2022: 53-78.

[8] WANG J, RAVI V, ALWAN A. Non-uniform speaker disentanglement for depression detection from raw speech signals; proceedings of the Interspeech, F, 2023 [C]. NIH Public Access.

[9] RAY A, KUMAR S, REDDY R, et al. Multi-level attenstion network using text, audio and video for depression prediction; proceedings of the Proceedings of the 9th international on audio/visual emotion challenge and workshop, F, 2019 [C].

[10] RODRIGUES MAKIUCHI M, WARNITA T, UTO K, et al. Multimodal fusion of bert-cnn and gated cnn representations for depression detection; proceedings of the Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, F, 2019 [C].

[11] CAO Y, HAO Y, LI B, et al. Depression prediction based on BiAttention-GRU [J]. 2022, 13(11): 5269-77.

[12] BUCUR A-M, COSMA A, ROSSO P, et al. It’s just a matter of time: Detecting depression with time-enriched multimodal transformers; proceedings of the European Conference on Information Retrieval, F, 2023 [C]. Springer.

[13] RAHMAN A B S, TA H-T, NAJJAR L, et al. DepressionEmo: A novel dataset for multilabel classification of depression emotions [J]. 2024, 366: 445-58.

[14] GIMENO-GóMEZ D, BUCUR A-M, COSMA A, et al. Reading Between the Frames: Multi-modal Depression Detection in Videos from Non-verbal Cues; proceedings of the European Conference on Information Retrieval, F, 2024 [C]. Springer.

[15] MCGINNIS E W, ANDERAU S P, HRUSCHAK J, et al. Giving voice to vulnerable children: machine learning analysis of speech detects anxiety and depression in early childhood [J]. 2019, 23(6): 2294-301.

[16] XU X, WANG Y, WEI X, et al. Attention-based acoustic feature fusion network for depression detection [J]. 2024, 601: 128209.

[17] HAQUE A, GUO M, MINER A S, et al. Measuring depression symptom severity from spoken language and 3D facial expressions [J]. 2018.

[18] LIU Y, OTT M, GOYAL N J A P A. Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach [J]. 2019, 1(3.1): 3.

[19] SERMANET P, LYNCH C, CHEBOTAR Y, et al. Time-contrastive networks: Self-supervised learning from video; proceedings of the 2018 IEEE international conference on robotics and automation (ICRA), F, 2018 [C]. IEEE.

[20] YU Y, SI X, HU C, et al. A review of recurrent neural networks: LSTM cells and network architectures [J]. 2019, 31(7): 1235-70.

[21] GLM T, ZENG A, XU B, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools [J]. 2024.

[22] KROENKE K, STRINE T W, SPITZER R L, et al. The PHQ-8 as a measure of current depression in the general population [J]. 2009, 114(1-3): 163-73.

[23] AMOS B, LUDWICZUK B, SATYANARAYANAN M J C S O C S. Openface: A general-purpose face recognition library with mobile applications [J]. 2016, 6(2): 20.

[24] EYBEN F, WöLLMER M, SCHULLER B. Opensmile: the munich versatile and fast open-source audio feature extractor; proceedings of the Proceedings of the 18th ACM international conference on Multimedia, F, 2010 [C].

[25] KIM T, VOSSEN P J A P A. Emoberta: Speaker-aware emotion recognition in conversation with roberta [J]. 2021.

[26] SHEN Y, YANG H, LIN L. Automatic depression detection: An emotional audio-textual corpus and a gru/bilstm-based model; proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), F, 2022 [C]. IEEE.

[27] CHEN Z, DENG J, ZHOU J, et al. Depression detection in clinical interviews with LLM-empowered structural element graph; proceedings of the Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), F, 2024 [C].

[28] ZOU B, HAN J, WANG Y, et al. Semi-structural interview-based Chinese multimodal depression corpus towards automatic preliminary screening of depressive disorders [J]. 2022, 14(4): 2823-38.