大模型驱动的科技政策法规问答系统研究OA北大核心CSTPCD
Research on Science and Technology Policy and Regulation Q&A System Driven by Large Models
科技政策法规问答系统(Q&A)在帮助公众理解和应用科技法规方面发挥关键作用.大语言模型(LLM)可以显著提升科技政策法规问答系统的准确性和效率.然而,基于大语言模型的科技政策法规问答系统仍然存在以下问题:缺乏大规模高质量的科技政策法规问答数据集,且现有自动构建大规模数据集的方法在引用和整合政策法规知识方面存在不足;问答系统在处理科技政策法规问题时,专业性、准确性不足且模型知识更新滞后.为解决这些问题,提出了一种检索增强自提示的问答数据集构建方法,并构建了一个大规模高质量的科技政策法规问答数据集;同时,构建了科技政策法规问答系统,该系统结合了经过低秩自适应(LoRA)微调技术优化的大语言模型与科技政策法规知识库,并运用提示学习技术,来引导系统生成准确的答案.实验结果显示,构建的问答数据集在引用和整合科技政策法规知识方面,比传统方法构建的问答数据集有显著提升;相较于通用大语言模型驱动的问答系统,该问答系统在各项指标上也有明显提高.
A question-and-answer(Q&A)system for science and technology(S&T)policies and regulations plays a critical role in helping the public understand and apply these regulations.Large language models(LLM)can signifi-cantly enhance the accuracy and efficiency of such systems.However,current LLM-based S&T policy and regula-tion Q&A systems face several challenges:the lack of large-scale,high-quality datasets,insufficient methods for auto-matically constructing datasets with accurate policy and regulation knowledge integration,and issues with the pro-fessional accuracy and timeliness of the models'knowledge updates.To address these challenges,this paper proposes a retrieval-augmented self-prompting method for constructing a high-quality,large-scale S&T policy and regulation Q&A dataset.Additionally,a Q&A system is developed,which combines an LLM optimized by low-rank adaptation(LoRA)techniques with an S&T policy and regulation knowledge base,and employs prompt learning techniques to guide the system in generating accurate answers.Experimental results demonstrate that the constructed Q&A dataset significantly improves the integration of policy and regulation knowledge compared with traditional methods.Fur-thermore,the proposed Q&A system outperforms general LLM-driven systems across various metrics,highlighting its enhanced performance in the domain of S&T policies and regulations.
向小伟;申艳光;胡明昊;闫天伟;罗威;罗准辰
河北工程大学 信息与电气工程学院,河北 邯郸 056038||军事科学院 军事科学信息研究中心,北京 100142河北工程大学 信息与电气工程学院,河北 邯郸 056038军事科学院 军事科学信息研究中心,北京 100142国防科技大学 计算机学院,长沙 410037
计算机与自动化
大语言模型问答数据集低秩自适应微调提示学习科技政策法规问答系统
large language modelquestion-and-answer datasetlow-rank adaptive fine-tuningprompt learningscience and technology policy and regulationquestion-and-answer system
《计算机科学与探索》 2024 (009)
2349-2360 / 12
国家自然科学基金面上项目(62376284). This work was supported by the National Natural Science Foundation of China(62376284).
评论