| 注册
首页|期刊导航|河北工业科技|一种基于领域知识的检索增强生成方法

一种基于领域知识的检索增强生成方法

张高飞 李欢 池云仙 赵巧红 勾智楠 高凯

河北工业科技2025,Vol.42Issue(2):103-110,196,9.
河北工业科技2025,Vol.42Issue(2):103-110,196,9.DOI:10.7535/hbgykj.2025yx02001

一种基于领域知识的检索增强生成方法

A retrieval-augmented generation method based on domain knowledge

张高飞 1李欢 1池云仙 1赵巧红 2勾智楠 3高凯1

作者信息

  • 1. 河北科技大学信息科学与工程学院,河北 石家庄 050018
  • 2. 河北轨道运输职业技术学院,河北 石家庄 050801
  • 3. 河北经贸大学管理科学与信息工程学院,河北 石家庄 050061
  • 折叠

摘要

Abstract

In order to enhance the accuracy of current large language model(LLM)in generating answers using retrieval documents,a retrieval-augmented generation method based on domain knowledge was proposed.Firstly,during the retrieval process,the first layer of sparse retrieval was conducted using both the question and domain knowledge,providing a domain-specific dataset for subsequent dense retrieval.Secondly,in the generation process,a zero-shot learning method was employed to concatenate domain knowledge before or after the question,and combined it with the retrieved documents to input into the large language model.Finally,extensive experiments were conducted on datasets in the medical and legal domains using ChatGLM2-6B and Baichuan2-7B-chat,and performance evaluations were conducted.The results indicate that the retrieval-augmented generation method based on domain knowledge can effectively improve the domain relevance of the answers generated by large language models,and the zero-shot learning method outperforms the fine-tuning method.When the zero-shot learning method is used,the sparse retrieval incorporating domain knowledge and the method of placing domain knowledge before the question achieve the best improvement on ChatGLM2-6B,with ROUGE-1,ROUGE-2 and ROUGE-L scores increasing by 3.82,1.68 and 4.32 percentage points respectively compared to the baseline method.The proposed method can enhance the accuracy of the answers generated by large language models and provide an important reference for the research and application of open-domain question answering.

关键词

自然语言处理/开放域问答/检索增强生成/大语言模型/零样本学习/领域知识

Key words

natural language processing/open-domain question answering/retrieval-augmented generation/large language model/zero-shot learning/domain knowledge

分类

计算机与自动化

引用本文复制引用

张高飞,李欢,池云仙,赵巧红,勾智楠,高凯..一种基于领域知识的检索增强生成方法[J].河北工业科技,2025,42(2):103-110,196,9.

基金项目

河北省自然科学基金(F2022208006,F2023207003) (F2022208006,F2023207003)

河北省高等学校科学技术研究项目(QN2024196) (QN2024196)

河北工业科技

1008-1534

访问量0
|
下载量0
段落导航相关论文