| 注册
首页|期刊导航|网络安全与数据治理|面向国产数据库的Text-to-SQL数据集设计

面向国产数据库的Text-to-SQL数据集设计

李国深 刘莹君 于莉娜 纪涛 张航 吴继冰

网络安全与数据治理2025,Vol.44Issue(11):52-59,8.
网络安全与数据治理2025,Vol.44Issue(11):52-59,8.DOI:10.19358/j.issn.2097-1788.2025.11.009

面向国产数据库的Text-to-SQL数据集设计

The design of Text-to-SQL datasets for domestic databases

李国深 1刘莹君 2于莉娜 2纪涛 2张航 1吴继冰1

作者信息

  • 1. 大数据与决策国家级重点实验室,湖南 长沙 410073
  • 2. 智能空间信息国家级重点实验室,北京 100029
  • 折叠

摘要

Abstract

With the development of intelligent technology,the number and scale of databases have surged.Traditional data ac-cess technologies face problems such as long-time consumption and low efficiency when meeting the needs of massive data process-ing.Text-to-SQL technology has thus become an important bridge connecting user needs and database access.However,existing technologies are usually trained on open-source non-domestic datasets,and their application is plagued by issues like inconsistent database operation languages,lack of domain knowledge,and poor reliability.To address this,this paper,in line with the locali-zation trend of software and hardware in the database field,designs a Text-to-SQL dataset for domestic databases,adopts a two-stage training technology for large language models based on synthetic data methods,proposes a Text-to-SQL method for domestic databases based on large language models,and fully verifies the effectiveness of the method through experiments.

关键词

大语言模型微调/合成数据集/偏好学习/国产数据库

Key words

fine-tuning of large language models/synthetic dataset/preference learning/domestic database

分类

计算机与自动化

引用本文复制引用

李国深,刘莹君,于莉娜,纪涛,张航,吴继冰..面向国产数据库的Text-to-SQL数据集设计[J].网络安全与数据治理,2025,44(11):52-59,8.

网络安全与数据治理

2097-1788

访问量0
|
下载量0
段落导航相关论文