Predicting Heart Disease Risk Using Machine Learning: A Comparative Analysis of Linear and Nonlinear Models

Xuchong Su

doi:10.62051/9qd8ax37

Authors

Xuchong Su

DOI:

https://doi.org/10.62051/9qd8ax37

Keywords:

Heart disease; machine learning; artificial intelligence.

Abstract

Heart disease is one of the leading causes of mortality worldwide, and early risk prediction plays a vital role in reducing its impact. Traditional assessment methods such as the Framingham Risk Score are widely used but rely on linear assumptions, which can overlook complex interactions between clinical factors. Machine Learning (ML) offers promising alternatives by modeling these nonlinear relationships. In this study, the predictive capabilities of two interpretable machine learning models—Logistic Regression and Random Forest—are compared using a clinical dataset of 918 patient records. The dataset includes key features such as age, sex, cholesterol, resting blood pressure, and heart rate. The Random Forest model slightly outperforms Logistic Regression in terms of accuracy (90.2% vs. 88.6%) and AUC (93.5% vs. 92.9%), while both models achieve high recall (93.1%), which is critical in minimizing missed diagnoses. Feature importance analysis using SHAP values identifies MaxHR, ST_Slope, and cholesterol as key predictors. This study highlights the potential of accessible, interpretable ML methods to support clinical decision-making in cardiovascular care while ensuring transparency and reproducibility.

Downloads

Download data is not yet available.

References

[1] Dehghan A, Jahangiry L, Khezri R, Jafari A, Pezeshki B, Rezaei F, Aune D. Framingham risk scores for determination the 10-year risk of cardiovascular disease in participants with and without the metabolic syndrome: results of the Fasa Persian cohort study. BMC Endocrine Disorders. 2024 Jun 24; 24(1):95.

[2] Asif S, Wenhui Y, ur-Rehman S, ul-ain Q, Amjad K, Yueyang Y, Jinhai S, Awais M. Advancements and prospects of machine learning in medical diagnostics: unveiling the future of diagnostic precision. Archives of Computational Methods in Engineering. 2024 Jun 26:1-31.

[3] Jones C, Castro DC, De Sousa Ribeiro F, Oktay O, McCradden M, Glocker B. A causal perspective on dataset bias in machine learning for medical imaging. Nature Machine Intelligence. 2024 Feb; 6(2):138-46.

[4] Gayap HT, Akhloufi MA. Deep machine learning for medical diagnosis, application to lung cancer detection: a review. BioMedInformatics. 2024 Jan 18; 4(1):236-84.

[5] Cheng H, Zhang M, Shi JQ. A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2024 Aug 21.

[6] Ahmadilivani MH, Taheri M, Raik J, Daneshtalab M, Jenihhin M. A systematic literature review on hardware reliability assessment methods for deep neural networks. ACM Computing Surveys. 2024 Jan 22; 56(6):1-39.

[7] Antamis T, Drosou A, Vafeiadis T, Nizamis A, Ioannidis D, Tzovaras D. Interpretability of deep neural networks: A review of methods, classification and hardware. Neurocomputing. 2024 Jul 17:128204.

[8] Elkahwagy DM, Kiriacos CJ, Mansour M. Logistic regression and other statistical tools in diagnostic biomarker studies. Clinical and Translational Oncology. 2024 Sep; 26(9):2172-80.

[9] Sunarya PA, Rahardja U, Chen SC, Lic YM, Hardini M. Deciphering digital social dynamics: A comparative study of logistic regression and random forest in predicting e-commerce customer behavior. Journal of Applied Data Sciences. 2024 Jan 29; 5(1):100-13.

[10] Iranzad R, Liu X. A review of random forest-based feature selection methods for data science education and applications. International Journal of Data Science and Analytics. 2024 Feb 3:1-5.