| 注册
首页|期刊导航|数字图书馆论坛|大模型训练数据版权侵权风险规制

大模型训练数据版权侵权风险规制

DAI JiangLong HE RuoNan

数字图书馆论坛2025,Vol.21Issue(9):74-81,8.
数字图书馆论坛2025,Vol.21Issue(9):74-81,8.DOI:10.3772/j.issn.1673-2286.2025.09.008

大模型训练数据版权侵权风险规制

Regulation of Copyright Infringement Risks in Large Model Training Data

DAI JiangLong 1HE RuoNan2

作者信息

  • 1. Law and Business School(School of Intellectual Property),Wuhan Institute of Technology,Wuhan 430070,P.R.China
  • 2. Law School,Durham University,Durham DH1-3LE,U.K.
  • 折叠

摘要

Abstract

The development of the artificial intelligence industry has intensified global competition in large models,making it a frontline arena for industrial advancement.The utilization of massive,high-quality data in training large models poses a series of copyright infringement challenges.Based on different stages of obtaining and utilizing training data for large models,this study proposes a tiered regulatory system for copyright infringement risks in training data by comparing the differentiated stances of the United States,Germany,and China in legal systems and judicial practices from the perspective of comparative law.During the stage of obtaining training datasets,it is essential to conduct thorough reviews of the legal assessment of replication required for machine learning under copyright law.This includes addressing the legitimacy of data sources,the requirement for unique data copies,and the exclusive purpose of data training to mitigate potential copyright risks.In the utilization stage of training datasets,for generative artificial intelligence large models,compliance with the non-expressive use requirement is mandatory.This necessitates integrating input and output processes while strictly controlling the likelihood of substantial similarity in outputs to exclude copyright infringement risks.For non-generative artificial intelligence large models,direct inclusion under fair use may be considered to exclude infringement.The fair use doctrine under copyright law remains a critical legal pathway for the legitimacy of training data acquisition and utilization.Accurate and systematic positioning is required,with applicable boundaries gradually established on a case-by-case basis.

关键词

大模型/生成式人工智能/训练数据/版权合规/侵权风险/合理使用

Key words

Large Model/Generative Artificial Intelligence/Training Data/Copyright Compliance/Infringement Risk/Fair Use

分类

社会科学

引用本文复制引用

DAI JiangLong,HE RuoNan..大模型训练数据版权侵权风险规制[J].数字图书馆论坛,2025,21(9):74-81,8.

基金项目

本研究得到2024年度教育部人文社会科学青年项目"大模型训练数据的版权侵权风险应对研究"(编号:24YJC820011)、2025年度湖北省社会科学基金法治湖北专项"湖北科技产业知识产权法治保障研究"(编号:200505)资助. (编号:24YJC820011)

数字图书馆论坛

1673-2286

访问量1
|
下载量0
段落导航相关论文