Heart disease prediction utilizing machine learning techniques

Authors

  • Litian Chen

DOI:

https://doi.org/10.62051/e054hq43

Keywords:

Heart disease prediction; Machine learning algorithms; Comparative analysis.

Abstract

Heart disease is an international public health issue and a significant health risk for many people. The World Health Organization (WHO) reported that heart disease has been identified as one of the primary causes of death. Owing to its rapid development and application in many areas, Machine Learning has become an effective technique for predicting heart disease. Machine learning, along with large-scale medical data and advanced algorithms, can assist healthcare professionals in accurately predicting the risk of heart disease, thus providing early intervention and treatment for patients. This research paper uses the “heart_2020_cleaned.csv” dataset, containing 319795 instances and 18 attributes, of which 70% of instances were randomly selected for the training set and 30% for testing. Applying machine learning algorithms in data mining such as Decision Tree (DT), LightGBM, Random Forest (RF) and Logistic Regression (LR) to forecast heart disease. Before constructing models, data cleaning, feature selection and hyperparameter tuning processes were done, aiming to explore the potential patterns among data. Comparative Analysis was conducted on the external test set to compare the prediction performance of different models at the same level. The result reported that the highest accuracy achieved with LightGBM was 76.9%, followed by Logistic Regression and Random Forest, with Decision Tree being the worst.

Downloads

Download data is not yet available.

References

What is a heart attack? American Heart Association. https://www.heart.org/en/health-topics/heart-attack/about-heart-attacks. Accessed March 29, 2022.

Gregory A. Roth, George A. Mensah, Catherine O. Johnson, et al. Global burden of cardiovascular diseases and risk factors, 1990-2019: Update from the GBD 2019 Study. J Am Coll Cardiol. Dec 09, 2020.

Gregory A. Roth, George A. Mensah, Valentin fuster the global burden of cardiovascular diseases and risks: a compass for global action. J Am Coll Cardiol. Dec 09, 2020.

Heart attack. National heart, lung, and blood institute. https: //www.nhlbi.nih.gov/health/heart-attack/causes. Accessed March 29, 2022.

J. N, D. P, M. E, R. Santhosh, R. Reshma and D. Selvapandian, "Heart attack prediction using machine learning," 2022 4th International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 2022, pp. 854-860, doi: 10.1109/ICIRCA54612. 2022. 9985736.

V. Sharma, S. Yadav and M. Gupta, "heart disease prediction using machine learning techniques," 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 2020, pp. 177 - 181, doi: 10.1109/ICACCCN51052.2020.9362842.

J. S. Rose, P. Malin Bruntha, S. Selvadass, R. M. V, B. C. Mary M and M. J. D, "Heart attack prediction using machine learning techniques," 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2023, pp. 210-213, doi: 10. 1109/ICACCS57279. 2023. 10113045.

M. Rizwan, S. Arshad, H. Aijaz, R. A. Khan and M. Z. U. Haque, "Heart attack prediction using machine learning approach," 2022 Third International Conference on Latest trends in Electrical Engineering and Computing Technologies (INTELLECT), Karachi, Pakistan, 2022, pp. 1-8, doi: 10.1109/INTELLECT55495. 2022. 9969395.

S. B. Patil and D. Kumaraswamy, "Extraction of significant patterns from heart disease warehouses for heart attack prediction", IJCSNS International Journal of Computer Science and Network Security, vol. 9, no. 2, pp. 228 - 235, 2009.

R. Tang and X. Zhang, "CART Decision Tree combined with Boruta feature selection for medical data classification," 2020 5th IEEE International Conference on Big Data Analytics (ICBDA), Xiamen, China, 2020, pp. 80 - 84, doi: 10.1109/ICBDA49040. 2020. 9101199.

A. Behnamian, K. Millard, S. N. Banks, L. White, M. Richardson and J. Pasher, "A systematic approach for variable selection with random forests: achieving stable variable importance values," in IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 11, pp. 1988 - 1992, Nov. 2017, doi: 10. 1109/LGRS. 2017. 2745049.

F. Arden and C. Safitri, "Hyperparameter tuning algorithm comparison with machine learning algorithms," 2022 6th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 2022, pp. 183 - 188, doi: 10.1109/ICITISEE57756. 2022. 10057630.

Kappelhof N, Ramos LA, Kappelhof M, van Os HJA, Chalos V, van Kranendonk KR, Kruyt ND, Roos YBWEM, van Zwam WH, van der Schaaf IC, van Walderveen MAA, Wermer MJH, van Oostenbrugge RJ, Lingsma H, Dippel D, Majoie CBLM, Marquering HA. Evolutionary algorithms and decision trees for predicting poor outcome after endovascular treatment for acute ischemic stroke. Comput Biol Med. 2021 Jun; 133: 104414. doi: 10.1016/j.compbiomed.2021. 104414. Epub 2021 Apr 21. PMID: 33962154.

Z. Wang, H. Ren, R. Lu and L. Huang, "Stacking based LightGBM-CatBoost-RandomForest algorithm and its application in big data modeling," 2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS), Chengdu, China, 2022, pp. 1-6, doi: 10.1109/DOCS55193. 2022. 9967714.

S. Naveen, S. K. Ravindran, S. G and S. N. Ameen, "Effective heart disease prediction framework using Random Forest and Logistic regression," 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), Vellore, India, 2023, pp. 1-6, doi: 10.1109/ViTECoN58111. 2023. 10157078.

P. Adeodato and S. Melo, "Kolmogorov-Smirnov and ROC curve metrics for binary classification performance assessment are equivalent," 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 2022, pp. 1194 - 1199, doi: 10. 1109/ICPR56361. 2022. 9956449.

P. Pongthanoo and W. Songpan, "Feature selection and reduction based on SMOTE and information gain for sentiment mining," 2020 5th International Conference on Computer and Communication Systems (ICCCS), Shanghai, China, 2020, pp. 109 - 114, doi: 10. 1109/ICCCS49078. 2020. 9118467.

T. Manvitha and K. S. Rekha, "Improved accuracy for prediction of leaf wetness using Logistic Regression algorithm compared with Decision Tree algorithm," 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), Chennai, India, 2023, pp. 1 - 5, doi: 10.1109/ICONSTEM56934. 2023. 10142550.

Rethinking drinking: alcohol and your health. National Institute on Alcohol Abuse and Alcoholism. https: //www.niaaa.nih.gov/niaaa-publications-order-form#pub-1. Accessed March 29, 2022.

Health Education & Content Services (Patient Education). Sex and heart disease. Mayo Clinic; 2011.

2020-2025 Dietary Guidelines for Americans. U.S. Department of Health and Human Services and U.S. Department of Agriculture. https: //www.dietaryguidelines.gov. Accessed March 29, 2022.

coronary artery disease. Mayo Clinic. https: //www.mayoclinic.org/diseases-conditions/coronary-artery-disease/symptoms-causes/syc-2 0350613. Accessed March 29, 2022.

Downloads

Published

24-03-2024

How to Cite

Chen, L. (2024). Heart disease prediction utilizing machine learning techniques. Transactions on Materials, Biotechnology and Life Sciences, 3, 35-50. https://doi.org/10.62051/e054hq43