Predicting Student Depression Using Machine Learning

Authors

  • Bo Peng

DOI:

https://doi.org/10.62051/5x88k313

Keywords:

Machine Learning; Students Depression; Model evaluation.

Abstract

The rising prevalence of depression among students has drawn significant attention. As data mining and artificial intelligence technologies continue to evolve, leveraging behavioral and textual data for early depression prediction offers new opportunities for timely mental health interventions. This study uses 17 depression-related features, including gender, financial stress, academic pressure, and working hours. To evaluate model performance, this study focused on three representative classification algorithms. The first is logistic regression, known for its interpretability; the second is random forest, which leverages ensemble learning; and the third is XGBoost, a powerful gradient boosting framework. To assess model performance, this study considered multiple quantitative indicators. These include the proportion of correct predictions (accuracy), the model’s ability to identify true positives (precision and recall), the harmonic means of those two (F1 score), and the area under the ROC curve (AUC), which reflects the overall classification capability. A 50-fold robustness test was also conducted to validate model stability. SHAP plots were utilized to interpret model predictions and to identify the most influential features contributing to depression risk at the end of this paper. The experimental data showed that Logistic Regression had the highest AUC score, which was 0.913, while the AUC scores of Random Forest and XGBoost were 0.906 and 0.903, respectively. Calibration curve analysis verified that Logistic Regression had the best calibration performance. The result of this study supports the feasibility and value of Logistic Regression in predicting student depression.

Downloads

Download data is not yet available.

References

[1] World Health Organization. Depression. 2025-12. Retrieved from https://www.who.int/india/health-topics/depression.

[2] University of Michigan School of Public Health. College Students’ Anxiety, Depression Higher Than Ever—But So Are Efforts to Receive Care. 2023-03-09. Retrieved from https://sph.umich.edu/news/2023posts/college-students-anxiety-depression-higher-than-ever-but-so-are-efforts-to-receive-care.html.

[3] Xu L. Preventing Adolescent Depression and Creating a Healthy Growth Environment. Gems of Health, 2025, (6): 87–88.

[4] Malik S. S., Khan A. Anxiety, Depression and Stress Prediction Among College Students Using Machine Learning Algorithms. 2023 Second International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), IEEE, 2023: 1–5.

[5] Vadher H. H., Aryan A., Vamshi K., Rajashekar R., Ashish D., Arora G. D. Unveiling the Potential of Machine Learning: Harnessing Machine Learning for Enhanced Coronary Heart Disease Detection and Intervention. 2024 4th International Conference on Advancement in Electronics & Communication Engineering (AECE), IEEE, 2024: 1073–1078.

[6] Shamim A. Student Depression Dataset. 2023. Retrieved from https://www.kaggle.com/datasets/adilshamim8/student-depression-dataset.

[7] Gupta J. The Accuracy of Supervised Machine Learning Algorithms in Predicting Cardiovascular Disease. 2021 International Conference on Artificial Intelligence and Computer Science Technology (ICAICST), IEEE, 2021: 234–239.

[8] Pant J., Singh D., Sharma V., Pant H. K., Bhatt J. A Machine-Learning Approach to Detect Heart Disease. 2024 8th International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, 2024: 860–863.

[9] Cramer J. S. The Origins of Logistic Regression. Tinbergen Institute Discussion Paper, No. 02-119/4, 2002.

[10] Breiman L. Random Forests. Machine Learning, 2001, 45: 5–32.

[11] Chen T., Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016: 785–794.

[12] Qin Z.-C. ROC Analysis for Predictions Made by Probabilistic Classifiers. 2005 International Conference on Machine Learning and Cybernetics, IEEE, 2005, 5: 3119–3124.

[13] Hancox-Li L. Robustness in Machine Learning Explanations: Does It Matter?. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020: 640–647.

[14] Marcílio W. E., Eler D. M. From Explanations to Feature Selection: Assessing SHAP Values as Feature Selection Mechanism. 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), IEEE, 2020: 340–347.

Downloads

Published

10-07-2025

How to Cite

Peng, B. (2025) “Predicting Student Depression Using Machine Learning”, Transactions on Computer Science and Intelligent Systems Research, 9, pp. 305–312. doi:10.62051/5x88k313.