Lung Cancer Prediction Using Machine Learning: A Comparative study of Classification Algorithms
DOI:
https://doi.org/10.62051/8wkw1c74Keywords:
Machine learning; Lung cancer prediction; Classification Algorithms; Data Preprocessing.Abstract
Lung cancer is a leading cause of death globally, often diagnosed at advanced stages due to the lack of early symptoms. Early detection is crucial for improving treatment outcomes and survival rates. With rising global cases, predicting lung cancer before it becomes critical is a key public health challenge. Symptoms like persistent cough, shortness of breath, and weight loss can signal the disease but are also common in other conditions, making early prediction vital for timely intervention. This study compares K-Nearest Neighbors (KNN), Random Forest, and Gaussian Naive Bayes (GNB) for lung cancer prediction. KNN and Random Forest showed extremely high accuracy, with KNN reaching 0.87. In contrast, GNB performed slightly worse. In addition, this paper also conducted feature importance analysis to explore the most critical feature factors for lung cancer prediction. The results highlight the effectiveness of machine learning in early lung cancer detection and provide a feature importance analysis, suggesting that better management of the disease can ultimately lead to improved outcomes for patients.
Downloads
References
[1] M. Jones, L. Williams, and A. Davis, Challenges and opportunities in the early detection of lung cancer, American Journal of Clinical Medicine, vol. 41, no. 5, pp. 512-523, 2020.
[2] J. Lee, H. Kim, and W. Cho, Advances in diagnostic imaging techniques for lung cancer, Journal of Medical Imaging, vol. 27, no. 3, pp. 199-215, 2021.
[3] R. Smith, M. Johnson, and L. Brown, The role of early diagnosis in improving lung cancer survival, Cancer Early Detection Journal, vol. 31, no. 4, pp. 157-169, 2020.
[4] M. Green and J. White, Machine learning applications in medical diagnostics: The future of healthcare, Journal of Medical Artificial Intelligence, vol. 6, no. 3, pp. 80-93, 2019.
[5] J. Zhang, S. Wang, and Y. Zhou, Advances in lung cancer early detection: Technologies and challenges, Lung Cancer Research and Treatment, vol. 50, no. 2, pp. 200-212, 2020.
[6] Y. Liu, Y. Zhang, and Z. Chen, The global burden of lung cancer: A systematic review, International Journal of Cancer Epidemiology, vol. 67, no. 2, pp. 78-90, 2021.
[7] Z. Yuan, Q. Li, and M. Zhang, Predicting lung cancer using Random Forest and Support Vector Machines based on clinical data, Journal of Medical Diagnostics, vol. 22, no. 1, pp. 45-58, 2020.
[8] A. Singh, P. Mehta, and S. Sharma, Predicting lung cancer stages using K-Nearest Neighbors and Naive Bayes algorithms with clinical and imaging data, Lung Cancer Research, vol. 35, no. 2, pp. 110-125, 2019.
[9] M. Dritsas and G. Trigka, Overcoming challenges with small datasets in lung cancer prediction using cross-validation and ensemble methods, Journal of AI in Healthcare, vol. 18, no. 4, pp. 50-65, 2022.
[10] S. Patel, R. Sharma, and A. Gupta, Understanding the diagnostic challenges in lung cancer detection, Journal of Clinical Oncology, vol. 39, no. 6, pp. 480-488, 2020.
[11] H. Johnson and Y. Zhao, Predictive modeling in lung cancer diagnostics: Current trends and future directions, Journal of Predictive Medicine, vol. 18, no. 3, pp. 120-134, 2021.
[12] L. Chen and Z. Wu, Identifying key predictors of lung cancer in elderly populations using SHAP values in ensemble models, Journal of Healthcare Predictive Analytics, vol. 30, no. 3, pp. 88-101, 2025.
[13] P. Brown and S. Clark, Symptoms and early indicators of lung cancer: A comprehensive review, Lung Cancer Journal, vol. 34, no. 2, pp. 89-102, 2020.
Downloads
Published
Conference Proceedings Volume
Section
License
Copyright (c) 2025 Transactions on Computer Science and Intelligent Systems Research

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







