Dynamic Feature Engineering for Breast Cancer Risk Stratification: A Machine Learning System Integrating Clinical Guidelines

Authors

  • Borui Liao

DOI:

https://doi.org/10.62051/p44xjf14

Keywords:

Breast Cancer Screening; Feature Engineering; Precision Oncology.

Abstract

Breast cancer is one of the most common malignancies among women worldwide, posing a major public health challenge due to its high incidence and complex biological characteristics. Breast cancer screening is the cornerstone of tumor prevention and requires the systematic integration of morphological biomarkers and clinical guidelines. This study proposes a dynamic feature engineering framework, which encodes tumor biology through nonlinear transformations, including the square root transformation of tumor radius to simulate the growth of cubic volume ( , The risk decays along with age stratification. When evaluated on the Wisconsin Diagnostic Breast Cancer Dataset (WDBC), XGBoost performed very well in terms of clinical information characteristics, with an AUC of 0.90 (sensitivity =92%, specificity =88%), outperforming the 7.2% of the linear model. This transformation effectively linearizes the cubic relationship between tumor radius and volume. These results emphasize that combining algorithm design with oncological principles can enhance predictive accuracy while reducing unnecessary interventions, providing a blueprint for AI-driven precision oncology.

Downloads

Download data is not yet available.

References

[1] R. L. Siegel, K. D. Miller, and A. Jemal, Cancer statistics, 2023, CA: A Cancer Journal for Clinicians, vol. 73, no. 1, pp. 17 - 48, 2023.

[2] H. Sung et al., Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: A Cancer Journal for Clinicians, vol. 71, no. 3, pp. 209 - 249, 2021.

[3] S. M. McKinney et al., international evaluation of an AI system for breast cancer screening, Nature, vol. 577, pp. 89 - 94, 2020.

[4] F. Bray et al., Global cancer transitions according to the Human Development Index (2008–2030): A population-based study, The Lancet Oncology, vol. 13, no. 8, pp. 790 - 801, 2012.

[5] NCCN, NCCN clinical practice guidelines in oncology: Breast cancer screening and diagnosis, Version 3.2023, National Comprehensive Cancer Network, 2023.

[6] C. E. DeSantis et al., Breast cancer statistics, 2019, CA: A Cancer Journal for Clinicians, vol. 69, no. 6, pp. 438 - 451, 2019.

[7] E. Wittenberg et al., Comparative Analysis of Machine Learning Models for Breast Cancer Risk Stratification: A Multicenter Study, BMC Medical Informatics and Decision Making, vol. 18, no. 1, p. 92, 2021.

[8] B. K. Kennedy et al., Aging and Cancer: The Role of Genomic Instability in Tumor Evolution, Nature Reviews Cancer, vol. 23, no. 5, pp. e202 - e210, 2023.

[9] C. W. Elston and I. O. Ellis, Pathological prognostic factors in breast cancer, Histopathology, vol. 19, no. 5, pp. 403 - 410, 1991.

[10] C. I. Lee et al., Comparative effectiveness of combined digital mammography and tomosynthesis screening, Radiology, vol. 294, no. 1, pp. 123 - 132, 2020.

[11] T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system, in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., San Francisco, CA, 2016, pp. 785 - 794.

[12] L. Breiman, Random forests, Machine Learning, vol. 45, no. 1, pp. 5 - 32, 2001.

[13] S. M. Lundberg and S.-I. Lee, A unified approach to interpreting model predictions, in Proc. Adv. Neural Inf. Process. Syst., Long Beach, CA, 2017, pp. 4765 - 4774.

Downloads

Published

10-07-2025

How to Cite

Liao, B. (2025) “Dynamic Feature Engineering for Breast Cancer Risk Stratification: A Machine Learning System Integrating Clinical Guidelines”, Transactions on Computer Science and Intelligent Systems Research, 9, pp. 564–568. doi:10.62051/p44xjf14.