Prediction of Infecting Cancer based on Logistic Regression Model
DOI:
https://doi.org/10.62051/6yfard30Keywords:
Logistic Regression; Learning Vector Quantization; data normalization; AUC-ROC curve.Abstract
Cancer is one of the most fatal contributors towards the increasing mortality rate of mankind. This represents an important topic to study for the sake of the welfare of humanity. However, the traditional manual diagnosis and prognosis procedures of this disease are quite time-consuming, even for a professional medical practitioner. Thus, a model with robust power of predictions regarding the state of the tumour (i.e., probable cancer) would benefit most patients from the toxic side effects and additional medical services fees incurred by inessential treatment. To this end, the Logistic Regression Method is applied to derive a powerful model combining an algorithm from machine learning criteria – Learning Vector Quantization. There are two phases in building this model, phase 1 is the pretreatment of our data from Kaggle, including the process of normalization, classification and feature selection. From feature selection, 14 variables are extracted based on their level of importance. Thereby, models are built on these 14 variables and one output Y, consisting of 0 or 1, derived from the classification process. These 14 variables have a huge impact towards the prediction process since they significantly reduce the work needed for the procedure. Phase 2 is applying the relevant methodology to produce our model and examine its efficiency. To test the ability of the trained logistic models to recognize cancer, we analyzed residual samples that were not previously used for the training procedure and correctly classified them in all cases. The evaluation of the model combines methods of the AUC-ROC curve as well as the confusion matrix, which are powerful statistical approaches. The AUC value after calculation is , which strengthens the validity and efficiency of the model. Besides, the confusion matrix reveals an accuracy of 0.9787 (out of 1). The repercussions of this model can be utilized in the field of forecasting the probability of cancer from concrete measurements of the tumor. This may refrain from the exorbitant expenditure on the usage of certain delicate medical machines, like X-ray. Moreover, this provides foundation statistics for the application of modern AI technology in the cancer prediction region.
Downloads
References
Ferlay J, Ervik M, Lam F, et al. Global Cancer Observatory: Cancer Today. Lyon: International Agency for Research on Cancer. IARC; 2018 [J]. 2020.
Cancer [EB/OL]. World Health Organization, World Health Organization, 2022-02-03. (2022-02-03) [2023-11-11].
Sung H, Ferlay J, Siegel R L, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries [J]. CA: a cancer journal for clinicians, 2021, 71 (3): 209 - 249.
Lin H T, Liu F C, Wu C Y, et al. Epidemiology and survival outcomes of lung cancer: a population-based study [J]. BioMed Research International, 2019, 2019.
Kuzniar T J, Masters G A, Ray D W. Screening for lung cancer-a review [J]. Medical Science Monitor, 2004, 10 (2): RA21 - RA. 30.
Huang S, Yang J, Shen N, et al. Artificial intelligence in lung cancer diagnosis and prognosis: Current application and future perspective[C]//Seminars in Cancer Biology. Academic Press, 2023.
Svoboda E. Artificial intelligence is improving the detection of lung cancer [J]. Nature, 2020, 587 (7834): S20 - S20.
Wang Z, Liu Y, Niu X. Application of artificial intelligence for improving early detection and prediction of therapeutic outcomes for gastric cancer in the era of precision oncology[C]//Seminars in Cancer Biology. Academic Press, 2023.
Learning vector quantization [EB/OL]. GeeksforGeeks, 2023-01-07. (2023-01-07) [2023-11-10].
Zhou X, Liu K Y, Wong S T C. Cancer classification and prediction using logistic regression with Bayesian gene selection[J]. Journal of Biomedical Informatics, 2004, 37 (4): 249 - 259.
Logistic regression in machine learning [EB/OL]. GeeksforGeeks, 2023-07-14. (2023-07-14) [2023-11-10].
Van den Berg R A, Hoefsloot H C J, Westerhuis J A, et al. Centering, scaling, and transformations: improving the biological information content of metabolomics data [J]. BMC genomics, 2006, 7: 1 - 15.
Data normalization in Data Mining [EB/OL]. GeeksforGeeks, 2023-02-02. (2023-02-02) [2023-11-10].
Confusion matrix in machine learning [EB/OL]. GeeksforGeeks, 2023-03-21. (2023-03-21) [2023-11-10].
AGARWAL R, Roc Curves & AUC: The Ultimate Guide [EB/OL]. Built In, 2022-08-18. (2022-08-18) [2023-11-10].
Downloads
Published
Conference Proceedings Volume
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







