网络安全与数据治理2026,Vol.45Issue(4):9-16,8.DOI:10.19358/j.issn.2097-1788.2026.04.002
数据工厂的构成、建设模式和运营机制研究
Research on the composition,construction models and operation mechanisms of data factories
摘要
Abstract
High-quality datasets are the core fuel for training large AI models.Currently,the construction of high-quality datasets is mainly carried out by AI enterprises themselves,which presents the characteristics of fragmentation,workshop-style operation and non-standardization,making it difficult to meet the rapid development needs of large AI models.Drawing on the development patterns of resource-based infrastruc-ture such as water and power plants,and combining domestic and international best practices in facility-based production,this paper proposes the concept of"data factory",defining it as a production facility specifically designed for the application of large AI models and for the facility-based,large-scale construction of high-quality datasets.The paper systematically expounds the three-level architecture system of the data facto-ry,which consists of storage workshop,production workshop,and pilot workshop.Four construction models and four operation mechanisms are proposed,providing theoretical support and practical references for promoting the facility-based and large-scale supply of high-quality datasets.关键词
数据工厂/高质量数据集/数据基础设施/数据要素Key words
data factory/high-quality dataset/data infrastructure/data element分类
管理科学引用本文复制引用
涂群,耿贵宁,张茜茜..数据工厂的构成、建设模式和运营机制研究[J].网络安全与数据治理,2026,45(4):9-16,8.基金项目
北京市社会科学基金(23GLC058) (23GLC058)