数字图书馆论坛2025,Vol.21Issue(9):74-81,8.DOI:10.3772/j.issn.1673-2286.2025.09.008
大模型训练数据版权侵权风险规制
Regulation of Copyright Infringement Risks in Large Model Training Data
摘要
Abstract
The development of the artificial intelligence industry has intensified global competition in large models,making it a frontline arena for industrial advancement.The utilization of massive,high-quality data in training large models poses a series of copyright infringement challenges.Based on different stages of obtaining and utilizing training data for large models,this study proposes a tiered regulatory system for copyright infringement risks in training data by comparing the differentiated stances of the United States,Germany,and China in legal systems and judicial practices from the perspective of comparative law.During the stage of obtaining training datasets,it is essential to conduct thorough reviews of the legal assessment of replication required for machine learning under copyright law.This includes addressing the legitimacy of data sources,the requirement for unique data copies,and the exclusive purpose of data training to mitigate potential copyright risks.In the utilization stage of training datasets,for generative artificial intelligence large models,compliance with the non-expressive use requirement is mandatory.This necessitates integrating input and output processes while strictly controlling the likelihood of substantial similarity in outputs to exclude copyright infringement risks.For non-generative artificial intelligence large models,direct inclusion under fair use may be considered to exclude infringement.The fair use doctrine under copyright law remains a critical legal pathway for the legitimacy of training data acquisition and utilization.Accurate and systematic positioning is required,with applicable boundaries gradually established on a case-by-case basis.关键词
大模型/生成式人工智能/训练数据/版权合规/侵权风险/合理使用Key words
Large Model/Generative Artificial Intelligence/Training Data/Copyright Compliance/Infringement Risk/Fair Use分类
社会科学引用本文复制引用
DAI JiangLong,HE RuoNan..大模型训练数据版权侵权风险规制[J].数字图书馆论坛,2025,21(9):74-81,8.基金项目
本研究得到2024年度教育部人文社会科学青年项目"大模型训练数据的版权侵权风险应对研究"(编号:24YJC820011)、2025年度湖北省社会科学基金法治湖北专项"湖北科技产业知识产权法治保障研究"(编号:200505)资助. (编号:24YJC820011)