农业大数据学报2024,Vol.6Issue(3):412-423,12.DOI:10.19788/j.issn.2096-6369.000052
农业垂直领域大语言模型构建流程和技术展望
Construction Process and Technological Prospects of Large Language Models in the Agricultural Vertical Domain
摘要
Abstract
With the proliferation of the internet,accessing agricultural knowledge and information has become more convenient.However,this information is often static and generic,failing to provide tailored solutions for specific situations.To address this issue,vertical domain models in agriculture combine agricultural data with large language models(LLMs),utilizing natural language processing and semantic understanding technologies to provide real-time answers to agricultural questions and play a crucial role in agricultural decision-making and extension.This paper details the construction process of LLMs in the agricultural vertical domain,including data collection and preprocessing,selecting appropriate pre-trained LLM base models,fine-tuning training,Retrieval Augmented Generation(RAG),evaluation.The paper also discusses the application of the LangChain framework in agricultural Q&A systems.Finally,the paper summarizes some challenges in building LLMs for the agricultural vertical domain,including data security challenges,model forgetting challenges,and model hallucination challenges,and proposes future development directions for agricultural models,including the utilization of multimodal data,real-time data updates,the integration of multilingual knowledge,and optimization of fine-tuning costs to further promote the intelligence and modernization of agricultural production.关键词
大语言模型/检索增强生成/LangChain/农业问答系统Key words
LLMs/RAG/LangChain/agricultural Q&A systems引用本文复制引用
张宇芹,朱景全,董薇,李富忠,郭雷风..农业垂直领域大语言模型构建流程和技术展望[J].农业大数据学报,2024,6(3):412-423,12.基金项目
科技创新2030——重大项目(2021ZD0110901),贵州省科技支撑项目(黔科合支撑[2023]一般189). (2021ZD0110901)