首页|期刊导航|大数据|人工智能大语言模型数据集现状和充实对策研究

人工智能大语言模型数据集现状和充实对策研究

胡晓女李涛李姗姗

大数据2025，Vol.11Issue(6)：57-71,15.

大数据2025，Vol.11Issue(6)：57-71,15.DOI:10.11959/j.issn.2096-0271.2025085

人工智能大语言模型数据集现状和充实对策研究

Research on the status of large language model data set of artificial intelligence and enriching countermeasure

胡晓女 ¹李涛 ²李姗姗²

作者信息

1. 澳门科技大学,澳门 999078||中国通信学会,北京 100846
2. 中国联合网络通信有限公司研究院,北京 100037
折叠

摘要

Abstract

Artificial intelligence large language model training data usually has the characteristics of large data scale,high data quality and rich data types.At present,although domestic data resources are rich,high-quality chinese large language model training data is still scarce,and there is still a certain gap between the quantity and quality of chinese large language model training data and the world's leading countries.Based on the situation of public data sets and typical general large language model data sets at home and abroad,this paper deeply studied and compared the relevant situation of large language model data sets at home and abroad,analyzed the challenges and problems faced by the development of large language model data sets in China,and puted forward countermeasures and suggestions to enrich the supply of large language model data sets in artificial intelligence.

关键词

人工智能/大语言模型/数据集

Key words

artificial intelligence/large language model/data set

分类

信息技术与安全科学

引用本文复制引用

胡晓女,李涛,李姗姗..人工智能大语言模型数据集现状和充实对策研究[J].大数据,2025,11(6):57-71,15.

基金项目

工业和信息化部2024年度指导性软课题(No.GXZK2024-49) （No.GXZK2024-49）

2024年度全国学会服务国家战略专项-面向AI的算力网关键技术路线图(No.(2024)837) Ministry of Industry and Information Technology of the People's Republic of China 2024 Guidance Soft Topic(No.GXZK2024-49),The 2024 National Academic Society Service National Strategy Special Project-Key Technology Roadmap for AI Computing Power Network(No.(2024)837) （No.(2024）

大数据

ISSN：2096-0271

访问量3

下载量0

段落导航