Comparative Analysis of Traditional Statistical and Machine Learning Approaches in Credit Scoring Applications

Leung Hon Sum

doi:10.62051/9fcack29

Authors

Leung Hon Sum

DOI:

https://doi.org/10.62051/9fcack29

Keywords:

Traditional Statistical; logistic regression; machine learning; support vector machines; credit scoring.

Abstract

This paper compares the performance of traditional statistical approaches, such as logistic regression (LR), and machine learning approaches, like support vector machines (SVMs), in credit scoring. In this paper, a dataset is simulated containing borrower characteristics like income, wealth, repayment history, length of requested loans, and total debt, which can be altered to represent different macroeconomic scenarios. Mathematica is used to train or fit and ultimately test both LR and SVM models on the dataset, focusing on evaluation metrics such as accuracy, precision, and recall to assess their performance. Results show that SVM consistently outperforms LR, with recall being 31.8% higher, precision being 15.6% higher, and accuracy being only 4.6% higher. This suggests that banks should consider implementing machine learning methods for credit scoring, as long as they have access to large datasets and sufficient computational power. Traditional approaches like LR should not be dismissed, as they offer transparency and interpretability, which are essential for financial institutions due to the fact that they are regulated entities.

Downloads

Download data is not yet available.

References

[1] Biswas, S., Carson, B., Chung, V., Singh, S. and Thomas, R., 2020. AI-bank of the future: Can banks meet the AI challenge. New York: McKinsey & Company.

[2] Sadok, H., Sakka, F., & El Maknouzi, M. E. H., 2022. Artificial intelligence and bank credit analysis: A review. Cogent Economics & Finance, 10(1), 2023262.

[3] Ghodselahi, A. and Amirmadhi, A., 2011. Application of artificial intelligence techniques for credit risk evaluation. International Journal of Modeling and Optimization, 1(3), p.243.

[4] Siddiqi, N., 2017. Intelligent credit scoring: Building and implementing better credit risk scorecards. John Wiley & Sons.

[5] Dastile, X., Celik, T. and Potsane, M., 2020. Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, p.106263.

[6] Avery, R. B., Calem, P. S., & Canner, G. B., 2004. Consumer credit scoring: do situational circumstances matter?. Journal of Banking & Finance, 28(4), 835-856.

[7] Gouvêa, M.A. and Gonçalves, E.B., 2007, May. Credit risk analysis applying logistic regression, neural networks and genetic algorithms models. In POMS 18th annual conference.

[8] Prajapati, G.L. and Patle, A., 2010, November. On performing classification using SVM with radial basis and polynomial kernel functions. In 2010 3rd International Conference on Emerging Trends in Engineering and Technology (pp. 512-515). IEEE.

[9] Cortes, C., & Vapnik, V., 1995. Support-vector networks. Machine learning, 20, 273-297.

[10] Salem, A.B. and Mount, T.D., 1974. A convenient descriptive model of income distribution: the gamma density. Econometrica: journal of the Econometric Society, pp.1115-1127.

[11] Crow, E.L. and Shimizu, K., 1987. Lognormal distributions. New York: Marcel Dekker.

[12] Chakraborti, A. and Patriarca, M., 2008. Gamma-distribution and wealth inequality. Pramana, 71, pp.233-243.

[13] Thom, H.C., 1958. A note on the gamma distribution. Monthly weather review, 86(4), pp.117-122.

[14] Johnson, N.L., Kotz, S. and Balakrishnan, N., 1994. Beta distributions. Continuous univariate distributions. 2nd ed. New York, NY: John Wiley and Sons, pp.221-235.