Optimizing Apache Spark for Healthcare Big Data Management
DOI:
https://doi.org/10.62051/21e8fw65Keywords:
Healthcare Big Data; Apache Spark; Real-Time Data Processing; Machine Learning.Abstract
The advent of big data in the healthcare sector necessitates real-time data analysis and processing capabilities to enhance medical decision-making. This study explores the optimization of Apache Spark, a powerful big data processing framework, for healthcare big data management. The research aims to assess Apache Spark's performance in handling large volumes of healthcare data and its potential for integration with emerging technologies. Utilizing the Medical Information Mart for Intensive Care (MIMIC-III) dataset, the study conducts a comparative analysis and benchmarks Apache Spark against other analytics tools, focusing on its efficiency and effectiveness in the healthcare domain. The methodology includes an in-depth examination of Spark's architecture, Spark Streaming for real-time data processing, and Machine Learning Library (MLlib) for machine learning tasks. Experimental results demonstrate that Apache Spark significantly improves the quality and efficiency of healthcare services through its high-performance and real-time computational capabilities. The study concludes with insights into future development, emphasizing the need for enhanced security and compatibility with evolving healthcare technologies. This research advances healthcare analytics, providing a roadmap for optimizing Spark's performance while ensuring data privacy and security.
Downloads
References
[1] J. Smith, et al. Predictive analytics for chronic disease diagnosis using Apache Spark. Journal of Medical Informatics, 45(2) (2018), 123-134.
[2] Y. Liu, et al. Patient readmission prediction using Apache Spark and machine learning. Healthcare Information Research, 37(4) (2019), 256-267.
[3] M. Brown, et al. Big data processing in healthcare using Apache Spark: A review. Journal of Big Data, 7(1) (2020), 48-62.
[4] M. Garcia, et al. Integration of Apache Spark with TensorFlow for enhanced medical image analysis. Journal of Digital Imaging, 43(2) (2020), 145-157.
[5] S. Kim, et al. Efficient processing of electronic health records with Apache Spark. Journal of Health Analytics, 29(3) (2018), 211-224.
[6] A.E.W. Johnson, T.J. Pollard, L. Shen, et al. MIMIC-III, a freely accessible critical care database. Scientific data, 3(1) (2016), 1-9.
[7] J.C. Lin, M.C. Lee, I.C. Yu, et al. Modeling and simulation of spark streaming. IEEE 32nd International Conference on Advanced Information Networking and Applications, (2018), 407-413.
[8] P. Sarosh, S.A. Parah, B.A. Malik, et al. Real-time medical data security solution for smart healthcare. IEEE transactions on industrial informatics, 19(7), (2022) 8137-8147.
[9] G. Dhiman, S. Juneja, H. Mohafez, et al. Federated learning approach to protect healthcare data over big data scenario. Sustainability, 14(5) (2022), 2500.
[10] C. Lee, Z. Luo, K.Y. Ngiam, et al. Big healthcare data analytics: Challenges and applications. Handbook of large-scale distributed computing in smart healthcare, (2017), 11-41.
[11] O. Diallo, J.I.P.C. Rodrigues, M. Sene, et al. Real-time query processing optimization for cloud-based wireless body area networks. Information Sciences, (2014) 284: 84-94.
[12] A.D. Alahmar, R. Benlamri. SNOMED CT-based standardized e-clinical pathways for enabling big data analytics in healthcare. IEEE Access, 8 (2020), 92765-92775.
[13] K. Ndlovu, M. Mars, R.E. Scott. Interoperability frameworks linking mHealth applications to electronic record systems. BMC health services research, 21(1) (2021), 459.
[14] S. Shukla. Real-time monitoring and predictive analytics in healthcare: harnessing the power of data streaming. International Journal of Computer Applications, 185(8) (2023), 32-37.
Downloads
Published
Conference Proceedings Volume
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.