Research on Sentiment Analysis Based on Ensemble Learning
DOI:
https://doi.org/10.62051/ijcsit.v4n3.17Keywords:
Sentiment analysis, Integrated learning, Random forest, Gradient boosting tree, Support vector machinesAbstract
The purpose of this paper is to use the ensemble learning method to analyze the sentiment of tourist attraction reviews, so as to improve the reference value of tourists in travel decision-making. With the popularity of online reviews, how to extract emotional information efficiently and accurately has become an important topic. In this study, random forest, gradient boosted tree (GBDT) and support vector machine (SVM) were used as the base learners, and the ensemble strategy of weighted voting was used to construct the model. Optimize parameters with cross-validation and grid search, and comprehensively evaluate model performance with metrics such as AUC, Log Loss, MCC, and average accuracy. Experimental results show that the classification accuracy and robustness of the ensemble model are better than those of the single model, with an AUC of 0.92 and an MCC of 0.78. The balance between precision and recall was demonstrated through performance surface plot analysis based on different thresholds, and the optimal threshold range was determined. The model can provide reliable sentiment insights for tourism platform users, help scenic spot managers identify user needs, optimize services, and have scalability in other review analysis fields.
Downloads
References
[1] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135.
[2] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
[3] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232.
[4] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
[5] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
[6] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794).
[7] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[8] Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. In Mining text data (pp. 77-128). Springer, Boston, MA.
[9] LIU Bing. (2012). A review of sentiment analysis and opinion mining. Chinese Journal of Computers, 35(6), 1125-1138.
[10] Haiyan Wang, Wei Li, & Liang Zhang. (2018). Textual sentiment analysis of tourist attraction evaluation based on machine learning. Computer Engineering, 44(2), 248-252.
[11] Wu Jiangtao, Wang Wenbin, & Li Cong. (2019). A study on sentiment classification of Chinese reviews based on random forests. Computer Science, 46(2), 84-88.
[12] Zhiming Li, Xu Zhang, & Bing Li. (2020). Application of gradient boosting decision tree in text classification. Journal of Computer Application Research, 37(4), 1023-1028.
[13] ZHOU Xue, ZHAO Feng. (2017). Review of support vector machines in sentiment analysis. Journal of Software, 28(7), 1879-1894.
[14] Chen Weiming, Zhang Xiaoming, Liu Yang. (2015). Research on text classification method based on ensemble learning. Computer Engineering and Applications, 51(4), 117-120.
[15] HUANG Wei, WANG Pengfei. (2019). Research progress on Chinese sentiment analysis based on machine learning. Journal of Information Technology, 38(4), 33-40.
[16] LI Ping, HUANG Xiaojing, LI Ran. (2019). Sentiment analysis of travel reviews based on the combination of deep learning and traditional machine learning. Computer Science, 46(8), 72-77.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Computer Science and Information Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







