网络安全与数据治理2025,Vol.44Issue(11):52-59,8.DOI:10.19358/j.issn.2097-1788.2025.11.009
面向国产数据库的Text-to-SQL数据集设计
The design of Text-to-SQL datasets for domestic databases
李国深 1刘莹君 2于莉娜 2纪涛 2张航 1吴继冰1
作者信息
- 1. 大数据与决策国家级重点实验室,湖南 长沙 410073
- 2. 智能空间信息国家级重点实验室,北京 100029
- 折叠
摘要
Abstract
With the development of intelligent technology,the number and scale of databases have surged.Traditional data ac-cess technologies face problems such as long-time consumption and low efficiency when meeting the needs of massive data process-ing.Text-to-SQL technology has thus become an important bridge connecting user needs and database access.However,existing technologies are usually trained on open-source non-domestic datasets,and their application is plagued by issues like inconsistent database operation languages,lack of domain knowledge,and poor reliability.To address this,this paper,in line with the locali-zation trend of software and hardware in the database field,designs a Text-to-SQL dataset for domestic databases,adopts a two-stage training technology for large language models based on synthetic data methods,proposes a Text-to-SQL method for domestic databases based on large language models,and fully verifies the effectiveness of the method through experiments.关键词
大语言模型微调/合成数据集/偏好学习/国产数据库Key words
fine-tuning of large language models/synthetic dataset/preference learning/domestic database分类
计算机与自动化引用本文复制引用
李国深,刘莹君,于莉娜,纪涛,张航,吴继冰..面向国产数据库的Text-to-SQL数据集设计[J].网络安全与数据治理,2025,44(11):52-59,8.