Research on Keywords Extraction of Film Reviews Based on the KeyBERT Model
DOI:
https://doi.org/10.62051/1zpndy68Keywords:
KeyBert, Movie review, Unsupervised, Keywords.Abstract
Current film and television platforms still struggle to effectively gather information for users and companies through public film reviews. There is a lack of applications or research on keyword extraction from film reviews. This study aims to evaluate the effectiveness and feasibility of the KeyBERT model for extracting keywords from film reviews. The precision and recall rate are utilized to assess the impact of model extraction. The test results indicate that the average precision and recall rate of film review extraction are 0.600 and 0.387, respectively, which are slightly lower than those of other types of text. Specifically, the precision rate and recall rate of plot description film reviews are 0.80 and 0.50, respectively, which are higher than the rates for multidimensional subjective analysis film reviews (0.40 and 0.33). Furthermore, they surpass the precision rate of 0.20 and the recall rate of 0.20 for subjective emotional expression type reviews. It is worth noting that the precision rates decrease from 0.80 to 0.20 as the number of words reviewed increases from 100 to 500 in subjective emotional expression type reviews. While the KeyBERT model is suitable for extracting keywords from movie reviews, it is essential to consider the classification of such reviews, the structural breakdown of lengthy text, and minimizing personal bias as much as possible.
Downloads
References
Turney, P.D. (2002). Learning to Extract Keyphrases from Text. ArXiv, cs.LG/0212013.
Sun, Chengyu, Liang Hu, Shuai Li, Tuohang Li, Hongtu Li, and Ling Chi. 2020. "A Review of Unsupervised Keyphrase Extraction Methods Using Within-Collection Resources" Symmetry 12, no. 11: 1864. https://doi.org/10.3390/sym12111864
Papagiannopoulou, E., & Tsoumakas, G. (2020). A review of keyphrase extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(2), e1339.
Kim, S. N., & Kan, M. Y. (2009, August). Re-examining automatic keyphrase extraction approaches in scientific articles. In Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation, and Applications (MWE 2009) (pp. 9-16).
Liu, Z., Huang, W., Zheng, Y., & Sun, M. (2010, October). Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 366-376).
Karen Spärck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval. J. Documentation, 60(5):493–502.
Mihalcea, R., & Tarau, P. (2004, July). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
Nadim, M., Akopian, D., & Matamoros, A. (2023). A Comparative Assessment of Unsupervised Keyword Extraction Tools. IEEE Access.
Kelebercová, L., & Munk, M. (2022). Search queries related to COVID-19 based on keyword extraction. Procedia computer science, 207, 2618-2627.
Jafari, B. M., Luo, X., & Jafari, A. (2023, May). Unsupervised keyword extraction for hashtag recommendation in social media. In The International FLAIRS Conference Proceedings (Vol. 36).
Benlahbib, Abdessamad (2019), "1000 Movie Reviews (Review + Attached rating + Sentiment polarity) for Reputation Generation", Mendeley Data, V1, doi: 10.17632/38j8b6s2mx.1
Moor, M., Banerjee, O., Abad, Z. S. H., Krumholz, H. M., Leskovec, J., Topol, E. J., & Rajpurkar, P. (2023). Foundation models for generalist medical artificial intelligence. Nature, 616(7956), 259-265.
Dornbusch, R., & Reynoso, A. (1989). Financial factors in economic development.
Grootendorst, M. (28 October 2020). Keyword Extraction with BERT. Maartengrootendorst. https://www.maartengrootendorst.com/blog/keybert/
Downloads
Published
Conference Proceedings Volume
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.