Research on the Adjudication of Infringement in the Use of Works in AI Data Training and the Statutory License System

Authors

  • Xucan Liu

DOI:

https://doi.org/10.62051/pwgck217

Keywords:

Artificial Intelligence; Data Training; Copyright Infringement; Statutory License; Balance of Interests.

Abstract

In response to the national policy directive on improving the development and governance mechanisms for generative artificial intelligence, and to address the challenges posed by AI technology to the current copyright system, this paper focuses on the core legal issues surrounding the use of works in AI model data training. The research first deeply analyzes the nature of the use of works during the data collection and model training stages, demonstrating that constitutes acts of use regulated under the current copyright law. Thus, unauthorized use constitutes an infringement upon the exclusive rights of copyright holders, such as the right of reproduction. This paper, grounded in an analysis of the operational mechanisms of data training in artificial intelligence, argues that the use of copyrighted works during the stages of data collection and model training constitutes a form of use subject to regulation under existing copyright law. Accordingly, the unauthorized use of such works by AI technology companies for training purposes, absent the consent of the rights holders, shall be deemed an act of infringement. This paper contends that neither the traditional licensing model nor the proposed fair use approach can adequately resolve the tension between the protection of copyright holders’ rights and the advancement of artificial intelligence technologies. An overly stringent intellectual property regime may undermine the interests of rights holders, while an excessively permissive framework risks stifling technological innovation. To address this dilemma, the paper advocates for the appliance of statutory license to the field of data training. Such a regime would enable AI technology companies to make large-scale use of copyrighted works while ensuring that the legitimate interests of rights holders are duly respected. Besides, this approach aligns with the dual value orientation of copyright law, which seeks to balance protection with limitation.

Downloads

Download data is not yet available.

References

[1] X. C. Liu, "Non-Work-Related Use" and its Legitimacy Justification in Generative Artificial Intelligence Data Training, Law Forum (3) (2024).

[2] J. Y. Zhang, S. F. Wang, Research on Copyright Fair Use in Large Model Data Training, Journal of East China University of Political Science and Law (4) (2024).

[3] Q. W. Li, Legal Regulation Path for the Use of Works in Algorithm Training under Copyright Law, Science and Technology and Publishing (7) (2024).

[4] Q. Wang, C. Chu, A preliminary exploration of the boundary between artificial intelligence and copyright: Legal challenges and thoughts under technological progress, Chinese Editor (8) (2024) 58.

[5] M. Kretschmer, T. Margoni, P. Oruç, Copyright law and the lifecycle of machine learning models, IIC-International Review of Intellectual Property and Competition Law 55(1) (2024) 110-138.

[6] B. Massimino, Accessing online data: Web‐crawling and information‐scraping techniques to automate the assembly of research data, Journal of Business Logistics 37(1) (2016) 34-42.

[7] C. M. Dahl, T. S. Johansen, E. N. Sørensen, C. E. Westermann, S. F. Wittrock, Applications of machine learning in document digitisation, arXiv preprint arXiv:2102.03239 (2021).

[8] C. Zong, R. Xia, J. Zhang, Data annotation and preprocessing, in: Text Data Mining, Springer Singapore, Singapore, (2021) pp. 15-31.

[9] T. Zhang, Legal risks and inclusive and prudent regulation of generative artificial intelligence training datasets, Comparative Law Review (4) (2024) 92.

[10] Y. Gao, Regulation of copyright infringement by artificial intelligence training data, China Publishing Journal (5) (2024) 14.

[11] A. Levendowski, How copyright law can fix artificial intelligence's implicit bias problem, Wash. L. Rev. 93 (2018) 579-630.

[12] Y. H. Liu, Y. S. Wei, The copyright infringement problem of machine learning and its solution, Journal of East China University of Political Science and Law (2) (2019) 76.

[13] Y. Gao, D. Y. Hu, Challenges and Responses of Machine Learning to the Copyright Fair Use System, Electronic Intellectual Property (10) (2020).

Downloads

Published

17-08-2025

How to Cite

Liu, X. (2025). Research on the Adjudication of Infringement in the Use of Works in AI Data Training and the Statutory License System. Transactions on Social Science, Education and Humanities Research, 14, 386-395. https://doi.org/10.62051/pwgck217