Underwater Object Detection Using YOLOv8 Enhanced with Region-based Feature Aggregation Attention

Rui Yang

doi:10.62051/ijcsit.v7n1.06

Authors

Rui Yang

DOI:

https://doi.org/10.62051/ijcsit.v7n1.06

Keywords:

Underwater object detection, YOLOv8, Attention mechanism, RFA, Deep learning, Marine industry

Abstract

With the increasing demand for high-nutritional-value marine products such as sea cucumbers, scallops, and starfish, efficient underwater object detection has become critical for intelligent marine industry applications. Traditional manual sorting methods are inefficient, labor-intensive, and error-prone, making them unsuitable for large-scale industrial needs. Deep learning-based object detection, particularly the YOLO family of algorithms, has shown great potential in addressing these challenges. However, existing models still struggle with low image clarity, color distortion, and occlusion common in underwater environments. In this study, we propose an enhanced YOLOv8n model that integrates a Region-based Feature Aggregation (RFA) attention mechanism to improve feature representation in underwater scenarios. The URPC2020 dataset was preprocessed and adapted for YOLOv8 training, and extensive experiments were conducted. Results demonstrate that the proposed model achieves improvements of 5.6%, 7.6%, 6.7%, and 20.2% in precision, recall, mAP50, and mAP50–95, respectively, compared to the baseline YOLOv8n. Furthermore, the proposed approach outperforms state-of-the-art detectors including YOLOv9s and YOLOv10s while maintaining lightweight architecture. An integrated underwater detection system was also developed with real-time image/video processing and graphical interface support, meeting the practical needs of the marine industry.

Downloads

Download data is not yet available.

References

[1] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.

[2] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788.

[3] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.

[4] Jocher, G., et al. (2022). YOLOv5: Implementation details and updates. GitHub repository: https://github.com/ultralytics/yolov5

[5] Ultralytics (2023). YOLOv8: Next-generation real-time object detection. GitHub repository: https://github.com/ultralytics/ultralytics

[6] Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), 3–19.

[7] Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7132–7141.

[8] Li, H., Xiong, P., Fan, H., & Sun, J. (2019). DFANet: Deep feature aggregation for real-time semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 9522–9531.

[9] Li, X., et al. (2020). Object detection in underwater environments: A survey. Pattern Recognition Letters, 135, 148–156.

[10] URPC2020 Dataset: Underwater Robot Picking Contest. Official competition dataset, 2020.