Application of Pre-training Model in Natural Language Processing

Zhaosicheng Chu

doi:10.62051/8wsmzk54

Authors

Zhaosicheng Chu

DOI:

https://doi.org/10.62051/8wsmzk54

Keywords:

pre-trained model; Natural Language Processing; BERT; Large Language Model.

Abstract

Natural language processing is a very important research area in the field of artificial intelligence, with the goal of enabling computers to understand human language, this research area brings together information from multiple disciplines such as linguistics, computer science, machine learning, mathematics, and cognitive psychology, it has two sides: cognition and understanding of natural language and natural language processing. Of these, natural language recognition and understanding allow computers to represent meaningful and quantifiable symbols and relationships in input language and compute accordingly depending on tasks. Natural language processing activities include the construction of models that explain language abilities and language applications, the creation of computational mechanisms for implementing and improving language models, the design of usage systems from language models, and the exploration of evaluation methods for such systems. This article elaborates on the conceptual framework of pre training models and discusses three mainstream pre training language architectures. Through comparative experiments with multiple indicators, this article verifies the excellent performance of pre trained models in natural language processing, which has significant advantages compared to traditional methods. Finally, the current status and future prospects are summarized.

Downloads

Download data is not yet available.

References

[1] Devlin J, Chang M W, Lee K, & Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019, 4171-4186.

[2] Radford A, Narasimhan K, Salimans T, & Sutskever I. Improving language understanding by generative pre-training. OpenAI Technical Report. 2018.

[3] Howard J, & Ruder S. Universal language model fine-tuning for text classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, 328-339.

[4] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, ... & Liu P J. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020, 21(140), 1-67.

[5] Sanh V, Debut L, Chaumond J, & Wolf T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. 2019.

[6] Hinton G, Vinyals O, & Dean J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. 2015.

[7] Mikolov T, Sutskever I, Chen K, Corrado G S, & Dean J. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 2013a, 26, 3111-3119.

[8] Mikolov T, Chen K, Corrado G, & Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. 2013b.

[9] Pennington J, Socher R, & Manning C D. GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1532-1543. 2014.

[10] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, ... & Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 30, 5998-6008.

[11] Peters M E, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, & Zettlemoyer L. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018, 2227-2237.

[12] Kaplan J, McCandlish S, Henighan T, Brown T B, Chess B, Child R, ... & Amodei D. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361. 2020.

[13] Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, ... & Amodei D. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020, 33, 1877-1901.

[14] Child R, Gray S, Radford A, & Sutskever I. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509. 2019.

[15] Hernandez D, Kaplan J, Henighan T, & McCandlish S. Scaling laws for transfer. arXiv preprint arXiv:2102.01293. 2021.

[16] Alabdulmohsin I, Lucic M, & Cissé M. Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems, 2023, 36.

[17] Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, ... & Zhang Y. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712. 2023.

[18] Zhao S, Zhang Y, Liu X, & Ding N. Domain adaptation for medical text classification via pre-trained language models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, 4567-4578.

[19] Li X, Wang H, Chen Z, & Zhang L. Knowledge graph-enhanced code generation with GPT models. Proceedings of the 2023 International Conference on Software Engineering, 2023, 1-12.