Application of Multiple Machine Learning Models based on Python in Predicting the Risk of Esophageal Cancer

Junhan Zhao

doi:10.62051/r1yhmf35

Authors

Junhan Zhao

DOI:

https://doi.org/10.62051/r1yhmf35

Keywords:

Machine learning; Python; esophagus cancer; prediction.

Abstract

As health awareness rises and technology advances, machine learning has garnered significant attention in cancer prediction. This study focuses on esophageal cancer, a common and high-mortality digestive tract tumor, aiming to evaluate the predictive accuracy of different machine learning models, identify optimal models, and examine the role of hyperparameter tuning through random search in enhancing prediction accuracy for models with lower performance. The findings of the study indicate that Random Forest (RF), GradientBoosting(GB), and Extreme Gradient Boosting (XGBoost) perform best, with accuracy reach 1.00; the K-Nearest Neighbors (KNN) accuracy is 0.97; the Support Vector Classification (SVC) accuracy is about 0.58, and the SVC with random search for hyperparameter tuning reaches 0.97; the accuracy of logistic regression is 0.68, and after random search hyperparameter tuning, it can reach 0.83. This study provides meaningful insights into leveraging machine learning for cancer prediction, with the potential to enhance future diagnostic practices and therapeutic strategies.

Downloads

Download data is not yet available.

References

[1] N. Deboever, C. M. Jones, K. Yamashita, J. A. Ajani, and W. L. Hofstetter, Advances in diagnosis and management of cancer of the esophagus, BMJ, vol. 385, 2024.

[2] J. Li, J. Xu, Y. Zheng, Y. Gao, S. He, H. Li, et al., Esophageal cancer: Epidemiology, risk factors and screening, Chinese Journal of Cancer Research, vol. 33, no. 5, pp. 535, 2021.

[3] Y. J. Zheng, Y. Teng, S. Y. He, M. D. Cao, Q. R. Li, N. P. Tan, et al., Epidemiological characteristics of esophageal cancer worldwide and in China, 2022, China Cancer, vol. 34, no. 3, pp. 165-170, 2025.

[4] X. Yang, Z. Tang, J. Li, and J. Jiang, Esophagus cancer and essential trace elements, Frontiers in Public Health, vol. 10, p. 1038153, 2022.

[5] C. Janiesch, P. Zschech, and K. Heinrich, Machine learning and deep learning, Electronic Markets, vol. 31, no. 3, pp. 685-695, 2021.

[6] I. H. Sarker, M. M. Hoque, M. K. Uddin, and T. Alsanoosy, Mobile data science and intelligent apps: Concepts, AI-based modeling and research directions, Mobile Networks and Applications, vol. 26, no. 1, pp. 285-303, 2021.

[7] R. El Shawi, M. Bahman, and S. Sakr, To tune or not to tune? An approach for recommending important hyperparameters for classification and clustering algorithms, Future Generation Computer Systems, vol. 163, p. 107524, 2025.

[8] J. Kossen, N. Band, C. Lyle, A. N. Gomez, T. Rainforth, and Y. Gal, Self-attention between datapoints: Going beyond individual input-output pairs in deep learning, Advances in Neural Information Processing Systems, vol. 34, pp. 28742-28756, 2021.

[9] D. Bhattacharyya, B. Dinesh Reddy, N. M. J. Kumari, and N. T. Rao, Comprehensive analysis on comparison of machine learning and deep learning applications on cardiac arrest, J Med Pharm Allied Sci, vol. 10, no. 4, pp. 3125-3131, 2021.

[10] H. Cheng, KNN-SVM classifiers in complex diagnosis, Journal of Physics: Conference Series, vol. 2694, no. 1, p. 012081, 2024.