Data Annotation Methodologies for Fake News

Authors

  • Ruiyi Wang

DOI:

https://doi.org/10.62051/tx2dxj37

Keywords:

Fake news; Data annotation; Fake news detection model.

Abstract

With the development of technology, information dissemination has become faster and more convenient. Fake news has drawn much attention due to its characteristics, such as rapid spread, strong disguise ability, and great harm. The performance of existing fake news detection models is highly dependent on the quality of training datasets. It is crucial to construct high-quality and lower-cost training datasets. The research progress of fake news dataset construction is systematically reviewed in this paper. Firstly, the categories and definition of fake news and the summary of existing mainstream datasets for detecting fake news are reviewed in this paper. Secondly, for traditional text news and newly derived multimodal news, the advantages and disadvantages of the existing annotation technologies are analyzed starting from the three aspects of traditional manual annotation, semi-automated annotation, and dynamic annotation. Finally, future research directions are proposed to address the problems of current datasets in dynamic annotation, multimodal fusion, and cross-domain generalization. High-quality datasets can effectively promote the development of fake news detection technology to meet the challenges of the increasingly complex network information environment.

Downloads

Download data is not yet available.

References

[1] Zhou Kaimin, Shu Chang, Li Binyang, et al. Early rumour detection. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 1614-1623.

[2] Hu Beizhe, Sheng Qiang, Cao Juan, et al. Bad actor, good advisor: Exploring the role of large language models in fake news detection. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(20): 22105-22113.

[3] Mosallanezhad A, Karami M, Shu Kai, et al. Domain adaptive fake news detection via reinforcement learning. Proceedings of the ACM web conference 2022, 2022: 3632-3640.

[4] Zhang Litian, Zhang Xiaoming, Zhou Ziyi, et al. Reinforced adaptive knowledge learning for multimodal fake news detection. Proceedings of the AAAI conference on artificial intelligence, 2024, 38(15): 16777-16785.

[5] Al-Quayed F, Javed D, Jhanjhi N Z, et al. A Hybrid Transformer-Based Model for Optimizing Fake News Detection. IEEE Access, 2024, 12: 160822-160834.

[6] Qin Simeng, Zhang Mingli. Boosting generalization of fine-tuning BERT for fake news detection. Information Processing & Management, 2024, 61(4): 1-18.

[7] Mahmud T, Akter T, Aziz M T, et al. Integration of NLP and deep learning for automated fake news detection. 2024 Second International Conference on Inventive Computing and Informatics (ICICI), 2024: 398-404.

[8] Alghamdi J, Luo S, Lin Y. A comprehensive survey on machine learning approaches for fake news detection. Multimedia Tools and Applications, 2024, 83(17): 51009-51067.

[9] Kumar Y. Combating Misinformation: Insights into Datasets, Models and Evaluation Strategies for Fake News. 2024 3rd Edition of IEEE Delhi Section Flagship Conference (DELCON). IEEE, 2024: 1-4.

[10] Pérez-Rosas V, Kleinberg B, Lefevre A, et al. Automatic Detection of Fake News. Proceedings of the 27th International Conference on Computational Linguistics, 2018: 3391-3401.

[11] Zhang Amy X, Ranganathan A, Metz S E, et al. A structured response to misinformation: Defining and annotating credibility indicators in news articles. Companion Proceedings of The Web Conference 2018, 2018: 603-612.

[12] Bonet-Jover A, Sepúlveda-Torres R, Saquete E, et al. RUN-AS: a novel approach to annotate news reliability for disinformation detection. Language Resources and Evaluation, 2024, 58(2): 609-639.

[13] Raza S, Paulen-Patterson D, Chen Ding. Fake news detection: comparative evaluation of BERT-like models and large language models with generative AI-annotated data. Knowl Inf Syst 67, 2025: 3267-3292.

[14] Bonet-Jover A, Sepúlveda-Torres R, Saquete E, et al. Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news. Engineering applications of artificial intelligence, 2023, 126: 107152.

[15] Bonet-Jover A. Semi-automatic annotation proposal for increasing a fake news dataset in spanish. CEUR Workshop Proceedings, 2021.

[16] Paka W S, Bansal R, Kaushik A, et al. Cross-SEAN: A cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. Applied Soft Computing, 2021, 107: 1-13.

[17] Akhtar M M, Karunanayake I, Sharma B, et al. Towards Automatic Annotation and Detection of Fake News. 2023 IEEE 48th Conference on Local Computer Networks (LCN), 2023: 1-9.

[18] Yang Yuzhou, Zhou Yangming, Ying Qichao, et al. Search, Examine and Early-Termination: Fake News Detection with Annotation-Free Evidences. IOS Press, 2024: 1463-1470.

[19] Silva A, Luo L, Karunasekera S, et al. Embracing domain differences in fake news: Cross-domain fake news detection using multi-modal data. Proceedings of the AAAI conference on artificial intelligence, 2021, 35(1): 557-565.

[20] Wang Yaqing, Yang Weifeng, Ma Fenglong, et al. Weak supervision for fake news detection via reinforcement learning. Proceedings of the AAAI conference on artificial intelligence, 2020, 34(01): 516-523.

Downloads

Published

10-07-2025

How to Cite

Wang, R. (2025) “Data Annotation Methodologies for Fake News”, Transactions on Computer Science and Intelligent Systems Research, 9, pp. 185–190. doi:10.62051/tx2dxj37.