Image Privacy Item Recognition Based on Hybrid Model of Hierarchical Feature Recognition and ViT

Chengyuan Liu

doi:10.62051/ijcsit.v4n1.17

Authors

Chengyuan Liu

DOI:

https://doi.org/10.62051/ijcsit.v4n1.17

Keywords:

Deep convolutional neural network, Privacy protection, Hierarchical feature extraction, Vision Transformer, Secure image processing

Abstract

With the development of artificial intelligence technology, privacy risk detection has become particularly important in scenarios such as intelligent monitoring and identity authentication. However, existing technologies have shortcomings in complex scenarios and global feature processing, resulting in low detection accuracy in some cases. This paper proposes a hybrid model that combines CNN, OCNN and Transformer models to extract features and achieves higher detection accuracy. This method innovatively combines the advantages of different feature extraction methods and improves the ability to identify privacy risks. Experimental results show that the proposed method outperforms existing technologies on multiple test sets, not only improving detection accuracy but also reducing false alarm rates.

Downloads

Download data is not yet available.

References

[1] Tran L, Kong D, Jin H, et al. Privacy-cnh: A framework to detect photo privacy with convolutional neural network using hierarchical features[C]//Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1).

[2] He wen si. (2024, May 13). “Take the Front Photo + Move the Car Phone”, 5 Cents a! What Kind of Business Is This? Guangzhou Daily ocean net. https://news.dayoo.com/society/202405/13/140000_54667546.htm

[3] Ma jing zhen. (2022, April 7). “Take the Front Photo + Move the Car Phone”, 5 Cents a! What Kind of Business Is This? Xiao County People’s Government. https://www.ahxx.gov.cn/grassroots/259/156495891.html

[4] Hemalakshmi, G. R., Murugappan, M., Sikkandar, M. Y., Begum, S. S., & Prakash, N. B. (2024). Automated retinal disease classification using hybrid transformer model (SViT) using optical coherence tomography images. Neural Computing and Applications, 1-18.

[5] Barhoumi, Y., & Rasool, G. (2021). Scopeformer: n-CNN-ViT hybrid model for intracranial hemorrhage classification. arXiv preprint arXiv:2107.04575.

[6] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.

[7] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.

[8] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

[9] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

[10] Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464-1480.

[11] Cottrell, M., & de Bodt, E. (1996, April). A Kohonen map representation to avoid misleading interpretations. In ESANN (Vol. 96, pp. 103-110).

[12] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

[13] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

[14] Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2), 423-443.

[15] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.

[16] Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR.

[17] Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

[18] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.

[19] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062.

[20] Yin, L., & Wang, Z. (2024). Bi-level binary coded fully connected classifier based on residual network 50 with bottom and deep level features for bearing fault diagnosis. Engineering Applications of Artificial Intelligence, 133, 108342.

[21] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.