A Review of Object Detection Empowering Sports: Key Technologies, Application Scenarios, and Future Outlook

Siyi Zhao

doi:10.62051/fb5aqp52

Authors

Siyi Zhao

DOI:

https://doi.org/10.62051/fb5aqp52

Keywords:

Object Detection; Sports Analytics; Computer Vision; Deep Learning; Model Optimization; AI in Sports; Sports Big Data

Abstract

Object detection is now a cornerstone of 'Smart Sports,' yet the direct application of general-purpose models to the dynamic and often chaotic sports environment is fraught with challenges. This paper systematically reviews the core technologies of object detection in sports, including the adaptability and limitations of mainstream detectors (e.g., the YOLO series, Transformer-based models) in sports scenarios. It also examines the role of optimization strategies such as model pruning, quantization, and knowledge distillation in balancing performance and resource consumption, as well as specialized techniques for small object detection, motion blur processing, and occlusion robustness enhancement. Based on this, the paper provides an in-depth analysis of the diverse applications of object detection in professional sports training (e.g., motion capture and biomechanical analysis), competitive game analysis (e.g., tactical minimap reconstruction from match videos), intelligent officiating (e.g., foul recognition assistance), athlete performance evaluation, interactive sports broadcasting, and public fitness. Finally, the paper summarizes current challenges, including data bottlenecks, algorithm generalization, the complexity of multi-modal fusion, and the leap from perception to cognition. It also provides an outlook on future directions, including constructing sports-specific vision foundation models, deepening multi-modal intelligent fusion, enhancing dynamic scene understanding capabilities, and improving sports datasets and evaluation systems to promote the development of sports analytics toward intelligence, personalization, and accessibility.

Downloads

Download data is not yet available.

References

[1] Zhao Z, Chai W, Hao S, et al. A survey of deep learning in sports applications: Perception, comprehension, and decision[J]. IEEE Transactions on Visualization and Computer Graphics, 2025.

[2] Mendes-Neves T, Meireles L, Mendes-Moreira J. A survey of advanced computer vision techniques for sports[J]. arXiv preprint arXiv:2301.07583, 2023.

[3] Golovkin V, Nemtsev N, Shandyba V, et al. From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction[C]//Proceedings of the Computer Vision and Pattern Recognition Conference. 2025: 6028-6038.

[4] Hiemann A, Kautz T, Zottmann T, et al. Enhancement of speed and accuracy trade-off for sports ball detection in videos—finding fast moving, small objects in real time[J]. Sensors, 2021, 21(9): 3214.

[5] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems (NIPS).

[6] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

[7] Girshick, R. (2015). Fast R-CNN. Proceedings of the IEEE international conference on computer vision (ICCV).

[8] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems (NIPS).

[9] He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE international conference on computer vision (ICCV).

[10] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

[11] Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

[12] Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767.

[13] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.

[14] Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). YOLOX: Exceeding YOLO Series in 2021. arXiv preprint arXiv:2107.08430.

[15] Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. European conference on computer vision (ECCV).

[17] Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision (ICCV).

[18] Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. European conference on computer vision (ECCV).

[19] Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2020). Deformable DETR: Deformable transformers for end-to-end object detection. International Conference on Learning Representations (ICLR).

[20] Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., ... & Wang, L. (2022). DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. International Conference on Learning Representations (ICLR).

[21] Du, L., Zhang, Z., Zhang, Y., Liu, J., & Wen, L. (2022). A survey on small object detection: Progress, challenges, and prospects. Pattern Recognition.

[22] Naik B T, Hashmi M F, Bokde N D. A comprehensive review of computer vision in sports: Open issues, future trends and research directions[J]. Applied Sciences, 2022, 12(9): 4429.

[23] Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M. J., Dueholm, J. V., Nasrollahi, K., ... & Van Droogenbroeck, M. (2021). SoccerNet-v2: A Large-Scale Benchmark for Video Understanding in Football. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24] Jing, L., & Tian, Y. (2021). Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. Advances in neural information processing systems (NIPS).

[26] Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.

[27] Liu J, Liu X, Qu M, et al. EITNet: An IoT-enhanced framework for real-time basketball action recognition[J]. Alexandria Engineering Journal, 2025, 110: 567-578.

[28] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

[29] Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

[30] Zhang, Z. Q. (2020). Computer vision in sports biomechanics: A systematic review. International Journal of Sports Science & Coaching.

[31] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y. (2020). A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems.

[32] Ibrahim, M. S., & Mori, G. (2018). Hierarchical graph-based activity parsing and recognition in team sports videos. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR).

[34] He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Wilson, G., & Cook, D. J. (2020). A survey of unsupervised deep domain adaptation. ACM Transactions on Intelligent Systems and Technology (TIST).

[36] Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine.

[37] Cabaset, S., Giancola, S., & Ghanem, B. (2019). ActivityNet-Sports: A Novel Dataset for Fine-Grained Activity Recognition in the Sports Domain. Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[38] Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE access.

[39] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR).