Advancements in Semantic Segmentation Using Deep Learning Approaches

Chenhuan Ni

doi:10.62051/wkgenq24

Authors

Chenhuan Ni

DOI:

https://doi.org/10.62051/wkgenq24

Keywords:

Semantic segmentation; deep learning; computer vision.

Abstract

Semantic segmentation plays an important part in computer vision by assigning semantic labels to each pixel in an image. Many models have been created to enhance segmentation performance as deep learning develops, from traditional models CNNs, FCNs, and U-Net, to the advanced models DeepLab series, GANs, and Transformer-based models. This paper offers a comprehensive overview of semantic segmentation methods based on deep learning. It begins with an introduction to the task and related background concepts, followed by a review of traditional and advanced models. A summary of current issues, including labeled data scarcity, complex object boundary definition, poor model generalization, and multi-scale information handling, is also provided to assist in future study and real-world application. The study also provides a comparative evaluation of the models' computational and performance characteristics. The overall goal of this work is to give a clear overview of the progress made in semantic segmentation using deep learning.

Downloads

Download data is not yet available.

References

[1] Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017.

[2] Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell., 2017, 40(4).

[3] Souly N, Spampinato C, Shah M. Semi Supervised Semantic Segmentation Using Generative Adversarial Network. Proc. IEEE Int. Conf. Comput. Vis., 2017.

[4] LeCun Y, Bottou L, Bengio Y, et al. Gradient - based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11).

[5] Minaee Shervin, Boykov Yuri, Porikli Fatih, et al. Image Segmentation Using Deep Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(7).

[6] Biscione V, Bowers J. Learning translation invariance in cnns. arXiv preprint arXiv:2011.11757, 2020.

[7] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012.

[8] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[9] He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.

[10] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

[11] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer - Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5 - 9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015.

[12] Chen L - C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint, arXiv:1706.05587, 2017.

[13] Chen L - C, Zhu Y, Papandreou G, Schroff F, Adam H. Encoder - decoder with atrous separable convolution for semantic image segmentation. Proc. Eur. Conf. Comput. Vis. (ECCV), 2018.

[14] Luc P, Couprie C, Chintala S, Verbeek J. Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408, 2016.

[15] Xue Y, Xu T, Zhang H, Long L R, Huang X. SegAN: Adversarial network with multi - scale L1 loss for medical image segmentation. Neuroinformatics, 2018, 16(3 - 4).

[16] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, … Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the International Conference on Learning Representations (ICLR), 2021.

[17] Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin Transformer: Hierarchical vision transformer using shifted windows. Proceedings of the International Conference on Computer Vision (ICCV), 2021.

[18] Carion N, Massa F, Synnaeve G, et al. End - to - End Object Detection with Transformers. Computer Vision – ECCV 2020, 16th European Conference, Glasgow, UK, August 23 – 28, 2020, Proceedings, Part I, 2020.

[19] Xie E, Wang W, Yu Z, et al. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 2021, 34.