| 注册
首页|期刊导航|智慧农业(中英文)|基于多模态融合大模型架构Agri-QA Net的作物知识问答系统

基于多模态融合大模型架构Agri-QA Net的作物知识问答系统

吴华瑞 赵春江 李静晨

智慧农业(中英文)2025,Vol.7Issue(1):1-10,10.
智慧农业(中英文)2025,Vol.7Issue(1):1-10,10.DOI:10.12133/j.smartag.SA202411005

基于多模态融合大模型架构Agri-QA Net的作物知识问答系统

Agri-QA Net:Multimodal Fusion Large Language Model Architecture for Crop Knowledge Question-Answering System

吴华瑞 1赵春江 1李静晨1

作者信息

  • 1. 北京市农林科学院信息技术研究中心,北京 100079,中国
  • 折叠

摘要

Abstract

[Objective]As agriculture increasingly relies on technological innovations to boost productivity and ensure sustainability,farmers need efficient and accurate tools to aid their decision-making processes.A key challenge in this context is the retrieval of specialized agricultural knowledge,which can be complex and diverse in nature.Traditional agricultural knowledge retrieval systems have often been limited by the modalities they utilize(e.g.,text or images alone),which restricts their effectiveness in addressing the wide range of queries farmers face.To address this challenge,a specialized multimodal question-answering system tailored for cabbage cultiva-tion was proposed.The system,named Agri-QA Net,integrates multimodal data to enhance the accuracy and applicability of agricul-tural knowledge retrieval.By incorporating diverse data modalities,Agri-QA Net aims to provide a holistic approach to agricultural knowledge retrieval,enabling farmers to interact with the system using multiple types of input,ranging from spoken queries to images of crop conditions.By doing so,it helps address the complexity of real-world agricultural environments and improves the accessibility of relevant information. [Methods]The architecture of Agri-QA Net was built upon the integration of multiple data modalities,including textual,auditory,and visual data.This multifaceted approach enables the system to develop a comprehensive understanding of agricultural knowledge,al-lowed the system to learn from a wide array of sources,enhancing its robustness and generalizability.The system incorporated state-of-the-art deep learning models,each designed to handle one specific type of data.Bidirectional Encoder Representations from Trans-formers(BERT)'s bidirectional attention mechanism allowed the model to understand the context of each word in a given sentence,significantly improving its ability to comprehend complex agricultural terminology and specialized concepts.The system also incorpo-rated acoustic models for processing audio inputs.These models analyzed the spoken queries from farmers,allowing the system to un-derstand natural language inputs even in noisy,non-ideal environments,which was a common challenge in real-world agricultural set-tings.Additionally,convolutional neural networks(CNNs)were employed to process images from various stages of cabbage growth.CNNs were highly effective in capturing spatial hierarchies in images,making them well-suited for tasks such as identifying pests,dis-eases,or growth abnormalities in cabbage crops.These features were subsequently fused in a Transformer-based fusion layer,which served as the core of the Agri-QA Net architecture.The fusion process ensured that each modality—text,audio,and image—contrib-utes effectively to the final model's understanding of a given query.This allowed the system to provide more nuanced answers to com-plex agricultural questions,such as identifying specific crop diseases or determining the optimal irrigation schedules for cabbage crops.In addition to the fusion layer,cross-modal attention mechanisms and domain-adaptive techniques were incorporated to refine the model's ability to understand and apply specialized agricultural knowledge.The cross-modal attention mechanism facilitated dy-namic interactions between the text,audio,and image data,ensuring that the model paid attention to the most relevant features from each modality.Domain-adaptive techniques further enhanced the system's performance by tailoring it to specific agricultural contexts,such as cabbage farming,pest control,or irrigation management. [Results and Discussions]The experimental evaluations demonstrated that Agri-QA Net outperforms traditional single-modal or simple multimodal models in agricultural knowledge tasks.With the support of multimodal inputs,the system achieved an accuracy rate of 89.5%,a precision rate of 87.9%,a recall rate of 91.3%,and an F1-Score of 89.6%,all of which are significantly higher than those of single-modality models.The integration of multimodal data significantly enhanced the system's capacity to understand complex agri-cultural queries,providing more precise and context-aware answers.The addition of cross-modal attention mechanisms enabled for more nuanced and dynamic interaction between the text,audio,and image data,which in turn improved the model's understanding of ambiguous or context-dependent queries,such as disease diagnosis or crop management.Furthermore,the domain-adaptive technique enabled the system to focus on specific agricultural terminology and concepts,thereby enhancing its performance in specialized tasks like cabbage cultivation and pest control.The case studies presented further validated the system's ability to assist farmers by provid-ing actionable,domain-specific answers to questions,demonstrating its practical application in real-world agricultural scenarios. [Conclusions]The proposed Agri-QA Net framework is an effective solution for addressing agricultural knowledge questions,especial-ly in the domain of cabbage cultivation.By integrating multimodal data and leveraging advanced deep learning techniques,the system demonstrates a high level of accuracy and adaptability.This study not only highlights the potential of multimodal fusion in agriculture but also paves the way for future developments in intelligent systems designed to support precision farming.Further work will focus on enhancing the model's performance by expanding the dataset to include more diverse agricultural scenarios,refining the handling of dialectical variations in audio inputs,and improving the system's ability to detect rare crop diseases.The ultimate goal is to contrib-ute to the modernization of agricultural practices,offering farmers more reliable and effective tools to solve the challenges in crop management.

关键词

多模态融合/人机交互/农业知识问答/甘蓝作物/大语言模型

Key words

multimodal fusion/human-computer interaction/agricultural knowledge Q&A/cabbage crops/large language model

分类

农业工程

引用本文复制引用

吴华瑞,赵春江,李静晨..基于多模态融合大模型架构Agri-QA Net的作物知识问答系统[J].智慧农业(中英文),2025,7(1):1-10,10.

基金项目

国家重点研发计划(2021ZD0113604) (2021ZD0113604)

科技创新2030重大项目(2022ZD0115705-05) National Key Research and Development Program of China(2021ZD0113604) (2022ZD0115705-05)

Scientific and Technological Inno-vation 2030-Major Project(2022ZD0115705-05) (2022ZD0115705-05)

智慧农业(中英文)

2096-8094

访问量0
|
下载量0
段落导航相关论文