Comparative Analysis Research on Machine Learning Models in Credit Risk Assessment
DOI:
https://doi.org/10.62051/19wa7a05Keywords:
Credit Risk Assessment; SMOTE; Machine learning; SHAP Interpretability; Cross-Dataset Validation.Abstract
Credit risk assessment is crucial for the risk management and control of financial institutions, but it faces challenges such as sample imbalance, complex characteristics and the lack of model interpretability. In this study, two public datasets, "Give Me Some Credit" and "Loan Default", were used. The Synthetic Minority Over-Sampling Technique (SMOTE) was employed to balance the sample distribution and conduct feature engineering. Construct new features such as the income-debt ratio (Income_Debt_Ratio) to reduce variable redundancy. Meanwhile, by comparing the model's different performance among logistic regression, Random Forest (RF), the study improves the training efficiency. The experiment results depict that the integrated models (XGBoost, LightGBM) perform better on both datasets, with an average accuracy rate of 94% and an AUC value of 0.98 compared with the traditional models. Furthermore, SHapley Additive exPlanations (SHAP) values were used to develop the interpretability analysis. This study provides credit institutions with a high-precision and interpretable model construction scheme, and verifies the generalization ability of the model through cross-datasets, laying a theoretical and practical foundation for future credit risk control and the construction of an integrated system.
Downloads
References
[1] Cheng Qiyun, Sun Caixin, Zhang Xiaoxing, et al. Short-Term load forecasting model and method for power system based on complementation of neural network and fuzzy logic. Transactions of China Electrotechnical Society, 2004, 19 (10): 53 - 58.
[2] Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C., Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring: An Update of Research, Eur. J. Oper. Res., vol. 247, no. 1, pp. 124 – 136, 2015.
[3] Chen, H., Yang, C., Du, M., & Zhang, Y., Research on Credit Risk Prediction Under Unbalanced Dataset Based on Ensemble Learning, Math. Probl. Eng., vol. 2023, Article ID 2927393, 18 pages, 2023.
[4] He, H., & Garcia, E. A., Learning from Imbalanced Data, IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263 – 1284, 2009.
[5] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P., SMOTE: Synthetic Minority Over-Sampling Technique, J. Artif. Intell. Res., vol. 16, pp. 321 – 357, 2002.
[6] Chen, T., & Guestrin, C., XGBoost: A Scalable Tree Boosting System, in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., San Francisco, CA, USA, 2016, pp. 785 – 794.
[7] Lundberg, S. M., & Lee, S. I., A Unified Approach to Interpreting Model Predictions, in Adv. Neural Inf. Process. Syst., vol. 30, 2017. [Online]. Available: https://papers.nips.cc/paper_files/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
[8] Bumin, M., & Ozcalici, M., Predicting the Direction of Financial Dollarization Movement with Genetic Algorithm and Machine Learning Algorithms: The Case of Turkey, Expert Syst. Appl., vol. 213, p. 119301, 2023.
[9] Hlongwane, R., Ramabao, K., & Mongwe, W., A Novel Framework for Enhancing Transparency in Credit Scoring: Leveraging Shapley Values for Interpretable Credit Scorecards, PLoS One, vol. 19, no. 8, p. e0308718, 2024.
[10] Didkovskyi, O., Jean, N., Pera, G. L., et al., Cross-Domain Behavioral Credit Modeling: Transferability from Private to Central Data, arXiv preprint, arXiv: 2401.09778, 2024.
[11] Bücker, M., Szepannek, G., Gosiewska, A., et al., Transparency, Auditability, and Explainability of Machine Learning Models in Credit Scoring, J. Oper. Res. Soc., vol. 73, no. 1, pp. 70 – 90, 2022.
Downloads
Published
Conference Proceedings Volume
Section
License
Copyright (c) 2025 Transactions on Computer Science and Intelligent Systems Research

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







