From YOLOv5 to YOLOv8: Structural Innovations and Performance Improvements

Feiyu Chen; Yingqian Zhang; Lei Fu; Hui Xie; Qian Zhang; Shihao Bi

doi:10.62051/ijcsit.v6n1.12

Authors

Feiyu Chen
Yingqian Zhang
Lei Fu
Hui Xie
Qian Zhang
Shihao Bi

DOI:

https://doi.org/10.62051/ijcsit.v6n1.12

Keywords:

Object detection, YOLO, YOLOv5, YOLOv8

Abstract

With the rapid advancement of object detection technology, the YOLO series has become ubiquitous across diverse computer vision applications owing to its efficiency and real-time capabilities. This paper delivers a systematic comparative analysis of YOLOv5 and YOLOv8, with an emphasis on their innovations and distinctions in network architecture, training mechanisms, inference optimizations, and detection performance. Relative to YOLOv5, YOLOv8 introduces substantial structural enhancements, notably the lightweight C2f feature extraction module and an anchor-free detection head, alongside state-of-the-art data augmentation strategies and novel loss functions that collectively boost both accuracy and inference speed. Furthermore, YOLOv8 advances inference optimization by supporting more flexible model export formats and acceleration pipelines, thereby facilitating deployment on mobile and edge devices. Through this comparison, we trace the technological evolution from YOLOv5 to YOLOv8 and project future trends in object detection research—particularly the integration of emerging techniques to further elevate model efficiency and performance. Our findings underscore the enduring potential of the YOLO series to drive progress in object detection methodologies.

Downloads

Download data is not yet available.

References

[1] Zou Z, Chen K, Shi Z, et al. Object detection in 20 years: A survey [J]. Proceedings of the IEEE, 2023, 111(3): 257-276.

[2] Gu J, Wang Z, Kuen J, et al. Recent advances in convolutional neural networks [J]. Pattern recognition, 2018, 77: 354-377.

[3] Jiang P, Ergu D, Liu F, et al. A Review of Yolo algorithm developments [J]. Procedia computer science, 2022, 199: 1066-1073.

[4] Sang J, Wu Z, Guo P, et al. An improved YOLOv2 for vehicle detection [J]. Sensors, 2018, 18(12): 4272.

[5] Zhao L, Li S. Object detection algorithm based on improved YOLOv3 [J]. Electronics, 2020, 9(3): 537.

[6] Dewi C, Chen R C, Jiang X, et al. Deep convolutional neural network for enhancing traffic sign recognition developed on Yolo V4 [J]. Multimedia Tools and Applications, 2022, 81(26): 37821-37845.

[7] Zhang Y, Guo Z, Wu J, et al. Real-time vehicle detection based on improved yolo v5 [J]. Sustainability, 2022, 14(19): 12274.

[8] Sohan M, Sai Ram T, Rami Reddy C V. A review on yolov8 and its advancements [C]//International Conference on Data Intelligence and Cognitive Informatics. Springer, Singapore, 2024: 529-545.

[9] Mutlag W K, Ali S K, Aydam Z M, et al. Feature extraction methods: a review [C]//Journal of Physics: Conference Series. IOP Publishing, 2020, 1591(1): 012028.

[10] Li H, Xiong P, Fan H, et al. Dfanet: Deep feature aggregation for real-time semantic segmentation [C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 9522-9531.

[11] Zhao J, Zhang Z, Ren J, et al. Dual Cross-Stage Partial Learning for Detecting Objects in Dehazed Images [C]//2024 IEEE International Conference on Data Mining (ICDM). IEEE, 2024: 629-638.

[12] Wang H, Zhang F, Wang L. Fruit classification model based on improved Darknet53 convolutional neural network [C]//2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). IEEE, 2020: 881-884.

[13] Ma J, Jiang X, Fan A, et al. Image matching from handcrafted to deep features: A survey [J]. International Journal of Computer Vision, 2021, 129(1): 23-79.

[14] Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 8759-8768.

[15] He J, Chen J N, Liu S, et al. Transfg: A transformer architecture for fine-grained recognition [C]//Proceedings of the AAAI conference on artificial intelligence. 2022, 36(1): 852-860.

[16] Qiu M, Huang L, Tang B H. Bridge detection method for HSRRSIs based on YOLOv5 with a decoupled head [J]. International Journal of Digital Earth, 2023, 16(1): 113-129.

[17] Zhang C, Bengio S, Hardt M, et al. Understanding deep learning (still) requires rethinking generalization[J]. Communications of the ACM, 2021, 64(3): 107-115.

[18] Wang X, Song J. ICIoU: Improved loss based on complete intersection over union for bounding box regression [J]. IEEE Access, 2021, 9: 105686-105695.

[19] Zhou D, Fang J, Song X, et al. Iou loss for 2d/3d object detection [C]//2019 international conference on 3D vision (3DV). IEEE, 2019: 85-94.

[20] Xu Y, Mo T, Feng Q, et al. Deep learning of feature representation with multiple instance learning for medical image analysis [C]//2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2014: 1626-1630.

[21] Zhao J, Zhang Z, Ren J, et al. Dual Cross-Stage Partial Learning for Detecting Objects in Dehazed Images [C]//2024 IEEE International Conference on Data Mining (ICDM). IEEE, 2024: 629-638.

[22] Elfwing S, Uchibe E, Doya K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning [J]. Neural networks, 2018, 107: 3-11.

[23] Guo F, Wang Y, Qian Y. Real-time dense traffic detection using lightweight backbone and improved path aggregation feature pyramid network [J]. Journal of Industrial Information Integration, 2023, 31: 100427.