Visual Object Tracking Using Deep Learning Techniques: A Comparison

Authors

  • Enzheng Su

DOI:

https://doi.org/10.62051/7wag7n50

Keywords:

Visual Object Tracking, Deep Learning, Correlation Filter, Siamese Network.

Abstract

Visual object tracking is one of the important topics in computer vision: given a target object in the first frame of a video, follow this target object on subsequent frames. Its applications are found in autonomous driving, human-computer interaction, military operations, and game development. However interesting these applications may be, the implementation remains very challenging because occlusions happen among other factors like scale variations or rotations that require a tracker to be robust and generalize well to give accurate results. In this paper, we provide an extensive review of visual object tracking methods divided into three primary frameworks: correlation filter-based trackers, Siamese network-based trackers, and Transformer-based trackers. We provide insights into a total of 12 state-of-the-art techniques for each category through large-scale experimentation on various benchmark data sets, aiming to reveal the research frontier of the visual object tracking field and point out some potentially useful directions for future investigations.

Downloads

Download data is not yet available.

References

[1] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

[2] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.

[3] Pouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Reyes, M. P., ... & Iyengar, S. S. (2018). A survey on deep learning: Algorithms, techniques, and applications. ACM Computing Surveys (CSUR), 51(5), 1-36.

[4] Shrestha, A., & Mahmood, A. (2019). Review of deep learning algorithms and architectures. IEEE Access, 7, 53040-53065.

[5] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097-1105.

[6] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[7] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778)

[8] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).

[9] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826).

[10] Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010, June). Visual object tracking using adaptive correlation filters. In 2010 IEEE computer society conference on computer vision and pattern recognition (pp. 2544-2550). IEEE.

[11] Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2014). High-speed tracking with kernelized correlation filters. IEEE transactions on pattern analysis and machine intelligence, 37(3), 583-596.

[12] Li, Y., & Zhu, J. (2015). A scale adaptive kernel correlation filter tracker with feature integration. In Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part II 13 (pp. 254-265). Springer International Publishing.

[13] Danelljan, M., Hager, G., Shahbaz Khan, F., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE international conference on computer vision (pp. 4310-4318).

[14] Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional siamese networks for object tracking. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14 (pp. 850-865). Springer International Publishing.

[15] Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8971-8980).

[16] Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., & Torr, P. H. (2017). End-to-end representation learning for correlation filter based tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2805-2813).

[17] Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., & Lu, H. (2021). Transformer tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8126-8135).

[18] Wu, Y., Lim, J., & Yang, M. H. (2013). Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2411-2418).

[19] Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., ... & Ling, H. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5374-5383).

Downloads

Published

25-11-2024

How to Cite

Su, E. (2024) “Visual Object Tracking Using Deep Learning Techniques: A Comparison”, Transactions on Computer Science and Intelligent Systems Research, 7, pp. 619–626. doi:10.62051/7wag7n50.