A Cantonese Restaurant Review Dataset for Aspect Category Sentiment Analysis with XLM-RoBERTa

Authors

  • Yuxin Cai

DOI:

https://doi.org/10.62051/z3q00d75

Keywords:

Cantonese natural language processing; sentiment analysis; multilingual Roberta.

Abstract

The volume of online consumer reviews has surged as e-commerce continues to grow, providing valuable insights for both consumers and business owners. Aspect Category Based Sentiment Analysis (ACSA) identifies sentiment polarity based on specific aspect categories in reviews. Despite extensive research in other languages, Cantonese remains underexplored due to its unique linguistic features and limited datasets. This study seeks to bridge this gap by fine-tuning a RoBERTa-based model for Aspect Category Sentiment Analysis (ACSA) on Cantonese restaurant reviews from OpenRice. A dataset is constructed by collecting and annotating 7,473 reviews based on five aspect categories: food, service, ambience, price, and timeliness. The fine-tuned XLM-RoBERTa model achieved an accuracy of 75.17%, outperforming five baseline models and demonstrating the efficacy of transformer-based models in low-resource languages. The study shows that the fine-tuned RoBERTa-based model has significant advantages in processing low-resource languages such as Cantonese, not only surpassing the baseline model in accuracy, but also providing a solid foundation for future research on Cantonese sentiment analysis. This work contributes a significant dataset and highlights potential future research directions.

Downloads

Download data is not yet available.

References

[1] Y Tian Y, Stewart C M. History of E-Commerce. ResearchGate, 2007.

[2] Ngai E W T, Lee M C M, Choi Y S, et al. Multiple-Domain Sentiment Classification for Cantonese Using a Combined Approach. AIS Electronic Library (AISeL), 2018.

[3] Lee J. toward a Parallel Corpus of Spoken Cantonese and Written Chinese. 2011.

[4] Alderete J, Chan Q, Chan M, et al. Cantonese grammar synopsis. 2017.

[5] Zhang Z, Ye Q, Zhang Z, et al. Sentiment classification of Internet restaurant reviews written in Cantonese. ResearchGate, 2011.

[6] Klyueva N, Long Y, Huang C R, et al. Food-related sentiment analysis for Cantonese. PolyU Scholars Hub, 2018.

[7] Xiang R, Jiao Y, Lu Q. Sentiment Augmented Attention Network for Cantonese Restaurant Review Analysis. PolyU Scholars Hub, 2019.

[8] OpenRice. Highlights / Key Facts. OpenRice, 2024.

[9] Liu Y. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv.org, 2019.

[10] Wang S, Zheng Y. Shenzhi-wang/Llama3-8B-Chinese-Chat · Hugging Face. Huggingface. co, 2024.

[11] Conneau A, et al. Unsupervised Cross-lingual Representation Learning at Scale. arXiv.org, 2019.

[12] Devlin J, Chang M-W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv.org, 2018.

Downloads

Published

25-11-2024

How to Cite

Cai, Y. (2024) “A Cantonese Restaurant Review Dataset for Aspect Category Sentiment Analysis with XLM-RoBERTa”, Transactions on Computer Science and Intelligent Systems Research, 7, pp. 234–241. doi:10.62051/z3q00d75.