数字图书馆论坛2025,Vol.21Issue(7):1-12,12.DOI:10.3772/j.issn.1673-2286.2025.07.001
混合任务场景下基于大语言模型的动态检索增强生成
Dynamic Retrieval-Augmented Generation for Mixed Task Scenarios Based on Large Language Models
摘要
Abstract
To address the problems of insufficient cross-domain knowledge integration efficiency and limited task generalization ability faced by large language models in multi-task and multi-language scenarios,this paper proposes a dynamic retrieval-based knowledge enhancement framework for mixed task scenarios to improve the content generation quality of large language models.A neural network classification tree model based on reinforcement learning is proposed.Through the labeled tree structure,the heterogeneous knowledge base is modularly mapped to the leaf nodes,the optimal knowledge base is retrieved as the target,and the data-enhanced knowledge is extracted from it and combined with the model,so as to achieve the effect of adaptively matching the optimal external knowledge according to the input.The experimental design is carried out from two dimensions:knowledge retrieval and enhanced generation.On the one hand,the retrieval accuracy is evaluated in the mixed task scenario.On the other hand,taking the Japanese text summarization task as an example,an empirical study of performance improvement is conducted on two public datasets,XL-Sum and WikiLingua.Experimental results show that the proposed framework has excellent accuracy in retrieving effective knowledge in mixed task scenarios with 24 datasets,and has a more significant improvement in the ROUGE indicators on summary tasks than traditional retrieval enhancement methods.The proposed framework has good practicality and scalability,and provides an effective solution for the adaptation of large language models in mixed task scenarios.关键词
大语言模型/检索增强生成/强化学习/多任务学习/智能信息处理Key words
Large Language Model/Retrieval-Augmented Generation/Reinforcement Learning/Multi-Task Learning/Intelligent Information Processing分类
计算机与自动化引用本文复制引用
余传明,李昊轩..混合任务场景下基于大语言模型的动态检索增强生成[J].数字图书馆论坛,2025,21(7):1-12,12.基金项目
本研究得到国家自然科学基金面上项目"面向跨语言观点摘要的领域知识表示与融合模型研究"(编号:71974202)、"基于知识增强的科技文献创新识别与评价模型研究"(编号:72374219)以及中南财经政法大学项目"多语言自然语言处理优质教学案例建设与应用"(编号:ALJS202520)资助. (编号:71974202)