Comparative Analysis Research on Machine Learning Models in Credit Risk Assessment

Bohan Zhang

doi:10.62051/19wa7a05

Authors

Bohan Zhang

DOI:

https://doi.org/10.62051/19wa7a05

Keywords:

Credit Risk Assessment; SMOTE; Machine learning; SHAP Interpretability; Cross-Dataset Validation.

Abstract

Credit risk assessment is crucial for the risk management and control of financial institutions, but it faces challenges such as sample imbalance, complex characteristics and the lack of model interpretability. In this study, two public datasets, "Give Me Some Credit" and "Loan Default", were used. The Synthetic Minority Over-Sampling Technique (SMOTE) was employed to balance the sample distribution and conduct feature engineering. Construct new features such as the income-debt ratio (Income_Debt_Ratio) to reduce variable redundancy. Meanwhile, by comparing the model's different performance among logistic regression, Random Forest (RF), the study improves the training efficiency. The experiment results depict that the integrated models (XGBoost, LightGBM) perform better on both datasets, with an average accuracy rate of 94% and an AUC value of 0.98 compared with the traditional models. Furthermore, SHapley Additive exPlanations (SHAP) values were used to develop the interpretability analysis. This study provides credit institutions with a high-precision and interpretable model construction scheme, and verifies the generalization ability of the model through cross-datasets, laying a theoretical and practical foundation for future credit risk control and the construction of an integrated system.

Downloads

Download data is not yet available.

References

[1] Cheng Qiyun, Sun Caixin, Zhang Xiaoxing, et al. Short-Term load forecasting model and method for power system based on complementation of neural network and fuzzy logic. Transactions of China Electrotechnical Society, 2004, 19 (10): 53 - 58.

[2] Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C., Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring: An Update of Research, Eur. J. Oper. Res., vol. 247, no. 1, pp. 124 – 136, 2015.

[3] Chen, H., Yang, C., Du, M., & Zhang, Y., Research on Credit Risk Prediction Under Unbalanced Dataset Based on Ensemble Learning, Math. Probl. Eng., vol. 2023, Article ID 2927393, 18 pages, 2023.

[4] He, H., & Garcia, E. A., Learning from Imbalanced Data, IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263 – 1284, 2009.

[5] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P., SMOTE: Synthetic Minority Over-Sampling Technique, J. Artif. Intell. Res., vol. 16, pp. 321 – 357, 2002.

[6] Chen, T., & Guestrin, C., XGBoost: A Scalable Tree Boosting System, in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., San Francisco, CA, USA, 2016, pp. 785 – 794.

[7] Lundberg, S. M., & Lee, S. I., A Unified Approach to Interpreting Model Predictions, in Adv. Neural Inf. Process. Syst., vol. 30, 2017. [Online]. Available: https://papers.nips.cc/paper_files/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.

[8] Bumin, M., & Ozcalici, M., Predicting the Direction of Financial Dollarization Movement with Genetic Algorithm and Machine Learning Algorithms: The Case of Turkey, Expert Syst. Appl., vol. 213, p. 119301, 2023.

[9] Hlongwane, R., Ramabao, K., & Mongwe, W., A Novel Framework for Enhancing Transparency in Credit Scoring: Leveraging Shapley Values for Interpretable Credit Scorecards, PLoS One, vol. 19, no. 8, p. e0308718, 2024.

[10] Didkovskyi, O., Jean, N., Pera, G. L., et al., Cross-Domain Behavioral Credit Modeling: Transferability from Private to Central Data, arXiv preprint, arXiv: 2401.09778, 2024.

[11] Bücker, M., Szepannek, G., Gosiewska, A., et al., Transparency, Auditability, and Explainability of Machine Learning Models in Credit Scoring, J. Oper. Res. Soc., vol. 73, no. 1, pp. 70 – 90, 2022.