Advancements in Natural Language Processing: A Study of Knowledge Graph Embedding Techniques

Jinsong Liu

doi:10.62051/1tjsff20

Authors

Jinsong Liu

DOI:

https://doi.org/10.62051/1tjsff20

Keywords:

Natural Language Processing, Knowledge Graph Embedding, TransE, Semantic Similarity, Vector Representation.

Abstract

The use of Natural Language Processing (NLP) has made it possible for machines to understand, interpret, and produce human language, making it a cornerstone of artificial intelligence. The ability to represent and reason with structured knowledge is crucial for advancing NLP capabilities. Knowledge Graphs (KGs) offer a powerful way to model entities and their relationships, and learning low-dimensional vector representations of these components is the objective of this technique. This paper delves into the study of NLP technologies, with a specific focus on Knowledge Graph Embedding (KGE) methods. The paper explored the principles and application of translation-based embedding models, particularly TransE, by detailing its training methodology on a standard benchmark dataset (FB15k-237). The research content involves data preprocessing, model initialization with specific embedding dimensions, an iterative training process utilizing margin-based loss, and evaluation through loss convergence and t-SNE visualization of learned entity embeddings. The results demonstrate the model's ability to converge effectively, as evidenced by a significant reduction in training loss over epochs. Furthermore, visualizations reveal distinct clustering of entity types, indicating that the model successfully captures semantic similarities and differences. This study underscores the efficacy of KGE models in learning meaningful representations from structured data, paving the way for enhanced accomplishment in numerous downstream NLP tasks such as linking prediction and entity classification.

Downloads

Download data is not yet available.

References

[1] J. O'Connor and I. McDermott, NLP, Thorsons, London, UK, 2001.

[2] M.V. Koroteev, BERT: A review of applications in natural language processing and understanding, arXiv preprint arXiv:2103.11943 (2021).

[3] S.M. Asmara, N.A. Sahabudin, N.S.N. Ismail and I.A.A. Sabri, A review of knowledge graph embedding methods of TransE, TransH and TransR for missing links, in: Proc. 2023 IEEE 8th Int. Conf. Softw. Eng. Comput. Syst. (ICSECS), IEEE, 2023, pp. 470–475.

[4] D. Zheng, X. Song, C. Ma, Z. Tan, Z. Ye, J. Dong, et al., DGL-KE: Training knowledge graph embeddings at scale, in: Proc. 43rd Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., 2020, pp. 739–748.

[5] M. Iferroudjene, V. Charpenay and A. Zimmermann, FB15k-CVT: A challenging dataset for knowledge graph embedding models, in: Proc. 17th Int. Workshop Neural-Symbolic Learn. Reason. (NeSy), 2023, pp. 381–394.

[6] X. Chen and Z. Liu, K-nearest neighbor query based on improved Kd-tree construction algorithm, J. Guangdong Univ. Technol. 31 (2014) 119–123.

[7] C.C. Aggarwal, Re-designing distance functions and distance-based applications for high-dimensional data, ACM SIGMOD Rec. 30 (2001) 13–18.

[8] L. van der Maaten and G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (2008) 11.

[9] B. Dong, A. Chen and M. Zhang, The role of machine learning in solving the overfitting problem, Psychol. Sci. 44 (2021) 274.

[10] X. Fu, M. Lü, W. Liu and X. Wei, Structured deep discriminative embedding coding network for image clustering, Laser Optoelectron. Prog. 58 (2021) 610016.