Tibetan Syllables Similarity Calculation Based on Word2Vec
Keywords:
Tibetan Syllables; Similarity Calculation; Word2Vec; Syllable Vectors.Abstract
This article focuses on the vector representation of Tibetan syllables as the basic text representation unit, and uses the CBOW model in Word2Vec as the representation method for syllable vectors. In terms of calculating the similarity between Tibetan syllables, different window sizes are set to obtain semantic relatedness information at different levels of syllables, and the average similarity between syllables is determined based on the similarity results calculated by multiple models. Finally, three examples of the performance of syllable similarity between three syllables are presented to demonstrate the feasibility of using syllable similarity calculation for syllable correction.
Downloads
References
Tibetan language, Baidu Baike, https://baike.baidu.com/item/%E8%97%8F% E8%AF%AD/.
Cai Zhijie, CaiRang Zhuoma. Design of a Tibetan Word Attribute Analysis System Based on Corpus. Computer Engineering, 2011, 37(22): 270-272.
Bai Xuejun, Gao Xiaolei, Gao Lei, Wang Yongsheng. Eye Movement Study on Perception Breadth of Tibetan Reading. Acta Psychologica Sinica, 2017, 49(05): 569-576.
Zhu Feng. Three Revisions of Tibetan Standardization. Language and Writing News, 2014, 493.
Cai Rang Zhuoma, Cai Zhijie. Modern Tibetan Word Building Decomposition Method. Journal of Qinghai University, 2010, 28(4): 83-86.
Jiang Di, Dong Yinghong. Statistical Research on Tibetan Information Processing Attributes. Journal of Chinese Information Processing, 1995, 9(2): 37-44.
Gesang Jumian, Gesang Yangjing. Practical Tibetan Grammar. Chengdu: Sichuan National Publishing House, 1987.
Kong Jiangping, Yu Hongzhi, Li Yonghong, Dawapengcuo, Hua Kan. Tibetan Dialect Survey Form. Beijing: Commercial Press, 2011.
Wang Fuzhao, Zhou Yan. Research on Error Detection Method of Tibetan Syllables. Computer Era, 2020(01): 5-9.
Zhu Jie, Li Tianrui, Liu Shengjiu. TSRM Tibetan Spelling Check Algorithm. Journal of Chinese Information Processing, 2014, 28(3): 92-98.
Duola, Zaxijia. Standard Syllable Frequency of Tibetan. Beijing: China Social Sciences Press, 2015.
Liu Huidan, Hong Jinling, Nuo Minghua, et al. Statistical Analysis of Tibetan Syllable Spelling Errors Based on Large-scale Network Corpus. Journal of Chinese Information Processing, 2017, 31(2): 61-70.
Zheng X, Chen H, Xu T. Deep Learning for Chinese Word Segmentation and POS Tagging[C]. Conference on Empirical Methods in Natural Language Processing. 2013:647-657.
Quoc V. Le, Tomas Mikolov. Distributed Representations of Sentences and Documents. arXiv:1045.4053,2014.
Mikolov T, Chen K, Corrado G and Dean J. Efficient Estimation of Word Representations in Vector Space, arXiv preprint arXiv:1301.3781, 2013.
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019: 4171-4186.
Cai Zhijie, Sun Maosong, Cai Rang Zhuoma. Construction of Evaluation Set for Tibetan Word Vector Similarity and Correlation. Journal of Chinese Information Processing, 2019, 33(07): 81-87.
Downloads
Published
Conference Proceedings Volume
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








