Research on Financial Loan Default Prediction Based on Multi-Model Ensemble and Custom Thresholds

Jialun Chen

doi:10.62051/7dnjhn18

Authors

Jialun Chen

DOI:

https://doi.org/10.62051/7dnjhn18

Keywords:

Loan Default; Risk Assessment; Machine Learning; Ensemble Learning; Threshold Optimization.

Abstract

Loan defaults pose significant threats to financial institutions' financial stability and reputation. Although existing risk assessment models have addressed this issue to some extent, they exhibit significant limitations when dealing with large-scale, high-dimensional data. Therefore, developing an advanced model that can predict loan defaults with higher accuracy is crucial. This paper aims to optimize loan default prediction by combining innovative algorithms and models to enhance the risk management capabilities of financial institutions and reduce economic losses. This study proposes a loan default prediction model based on the LendingClub dataset. The model integrates multiple machine learning algorithms, including Logistic Regression, Random Forest, Gradient Boosting, LightGBM, and CatBoost, as well as ensemble learning methods, aiming to improve the prediction accuracy and stability of the model. Through a comprehensive analysis of the model's precision, recall, and custom evaluation metrics, this paper establishes an optimized comprehensive model, improving recall from 60% to 80% and precision from 28% to 29%. By optimizing thresholds, the model significantly enhances the identification of bad loans while balancing precision and recall, providing an effective solution for loan default prediction.

Downloads

Download data is not yet available.

References

[1] H. Jiang, H. Yang, & S. Zhang. Predicting loan default with machine learning techniques. Journal of Financial Risk Management, 10 (4), 2021. 45-60.

[2] L. Chen, X. Li, & Z. Zhao. A review of machine learning approaches for credit risk assessment. Financial Innovations, 6 (1), 2020. 20-35.

[3] M. Li, Y. Wang, & W. Zhang. Challenges of traditional credit scoring models in the modern financial market. International Journal of Financial Studies, 12 (3), 2022. 101-115.

[4] Y. Zhang, Q. Liu, & X. Wu. An overview of credit risk prediction models: Traditional and modern approaches. Computational Economics, 58 (2), 2021. 200-215.

[5] Kaggle. 2021. Lending Club dataset. Retrieved from https://www.kaggle.com/datasets/wordsforthewise/lending-club

[6] S. Wang, L. Zhang, & J. Li. Understanding the features of financial datasets: Insights from the Lending Club data. Financial Analytics, 11 (4), 2021. 90-105.

[7] L. Wang, H. Liu, & Z. Chen. Data preprocessing techniques for improving model performance. International Journal of Data Science, 8 (1), 2020. 45-60.

[8] W. Li, S. Wang, & T. Zhang. Handling missing values and outliers in credit scoring models. Data Science Review, 9 (3), 2022. 215-229.

[9] R. Zhang, & J. Li. Custom metrics for evaluating classification models in financial contexts. Journal of Financial Analytics, 20 (3), 2022. 345-359.

[10] H. Chen, & J. Xu. Advanced techniques for credit risk prediction and management. Journal of Financial Technology, 15 (2), 2021. 133-150.

[11] Q. Liu, X. Zhou, & Y. Wang. A comparative study of machine learning algorithms for credit risk assessment. Machine Learning Research, 22 (4), 2019.301-318.

[12] J. Smith, A. Brown, & C. Lee. Evaluating model performance with precision, recall, and AUC: A practical guide. Computational Statistics, 37 (5), 2020, 765-779.

[13] J. Smith, L. Doe, & P. Adams. Logistic regression in predicting loan defaults: A comparative study. Data Science Review, 29 (2), 2022, 98-115.

[14] T. Zhang, & K. Lee. Precision and recall trade-offs in financial models. Transactions on Machine Learning, 22 (4), 2023, 321-340.

[15] X. Wang, Y. Chen, & M. Zhou.. The limitations of LightGBM in financial forecasting. International Journal of Machine Learning, 32 (1), 2024 45-60.