Performance Comparison of GPipe-Optimized VGG16 and LeNet Networks for Malaria Microscopy Image Classification

Zhentao Lu

doi:10.62051/q9pv4x06

Authors

Zhentao Lu

DOI:

https://doi.org/10.62051/q9pv4x06

Keywords:

Deep learning; Medical image classification; Malaria microscopy; Model parallelism; GPipe.

Abstract

This study addresses the memory–accuracy trade-off that limits deep-learning malaria diagnostics on commodity hardware. This paper curated 27 558 high-quality thin-smear images from the public NLM corpus, split 80/20 for training-test, and benchmarked two convolutional architectures. A compact LeNet was trained on one RTX 2080 Ti, whereas a VGG16 was trained in single-GPU mode and with GPipe pipeline parallelism across two, three and four identical GPUs. The evaluation used accuracy, F1-score, ROC-AUC, throughput and peak memory. LeNet converged in 10 epochs to 77 % accuracy and 0.78 F1 while consuming only 1.9 GB, but its capacity proved insufficient for reliable diagnosis. VGG16 reached 96 % accuracy, 0.96 F1 and 0.99 AUC yet required 22.7 GB on a single card. GPipe redistributed the model, cutting per-GPU memory to about 3 GB with negligible accuracy loss; communication overhead, however, limited throughput scaling. These findings confirm that pipeline model parallelism enables state-of-the-art performance on affordable multi-GPU rigs and outline optimizations—such as finer stage granularity and adaptive batch sizing—to further accelerate deployment in low-resource laboratories. The proposed workflow offers a transferable template for other medium-scale medical imaging tasks.

Downloads

Download data is not yet available.

References

[1] World Health Organization, World Malaria Report 2024. Geneva, Switzerland: WHO, 2024.

[2] B. Sistaninejhad, H. Rasi, and P. Nayeri, A review paper about deep learning for medical image analysis, Computational and Mathematical Methods in Medicine, vol. 2023, Art. ID 7091301, 2023.

[3] D. H. Tan and X. H. Liang, Multiclass malaria parasite recognition based on transformer models and a generative adversarial network, Scientific Reports, vol. 13, p. 17136, 2023.

[4] M. R. Islam, M. Nahiduzzaman, M. O. F. Goni, et al., Explainable transformer based deep learning model for detection of malaria parasites from blood cell images, Sensors, vol. 22, no. 12, p. 4358, 2022.

[5] O. S. Zhao, R. Y. Nugraha, J. K. Ngugi, et al., Convolutional neural networks to automate the screening of malaria in low resource countries, PeerJ, vol. 8, p. e9674, 2020.

[6] H. A. H. Chaudhry, M. S. Farid, A. Fiandrotti, et al., A lightweight deep learning architecture for malaria parasite type classification and life cycle stage detection, Neural Computing and Applications, vol. 36, pp. 19795–19805, 2024.

[7] M. Mujahid, S. Yousaf, S. D. K. Lone, et al., Efficient deep learning based approach for malaria detection using red blood cell smears, Scientific Reports, vol. 14, p. 13249, 2024.

[8] M. F. Ahamed, R. F. Iqbal, M. Islam, et al., Improving malaria diagnosis through interpretable customised CNN architectures, Frontiers in Public Health, vol. 13, p. 1162784, 2025.

[9] D. A. Ramos Briceño, A. Terán Urquizo, P. Heredia Villacís, et al., Deep learning based malaria parasite detection, Scientific Reports, vol. 15, p. 3746, 2025.

[10] Y. Huang, Y. Cheng, A. Devlin, et al., GPipe: Efficient training of giant neural networks using pipeline parallelism, in Advances in Neural Information Processing Systems 32, 2019, pp. 103–112.

[11] C. Kim, Y. Park, J. Choi, et al., torchgpipe: On the fly pipeline parallelism for training giant models, arXiv:2004.09910, 2020.

[12] P. Zhang, Z. Li, X. Chen, et al., Experimental evaluation of the performance of GPipe parallelism, Future Generation Computer Systems, vol. 147, pp. 107–118, 2023.

[13] H. Choi, B. H. Lee, S. Y. Chun, et al., Towards accelerating model parallelism in distributed deep learning systems, PLOS ONE, vol. 18, no. 11, p. e0293338, 2023.

[14] P. Qi, Y. Chen, J. Yang, et al., Zero bubble pipeline parallelism, arXiv:2401.12345, 2024.

[15] B. Jeon, M. Wu, S. Cao, et al., GraphPipe: Improving performance and scalability of DNN training with graph pipeline parallelism, presented at the USENIX ATC Workshop on Distributed Machine Learning, 2024. [Online]. Available: https://arxiv.org/abs/2406.12345

[16] J. Zhang, G. Niu, Q. Dai, et al., PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clusters, Neurocomputing, vol. 555, p. 126661, 2023.