首页|期刊导航|农业工程学报|基于改进DeepSeek的异步成熟设施番茄采摘决策方法

基于改进DeepSeek的异步成熟设施番茄采摘决策方法

袁帅江丹饶元王坦徐子悦吴康磊

农业工程学报2026，Vol.42Issue(3)：56-65,10.

农业工程学报2026，Vol.42Issue(3)：56-65,10.DOI:10.11975/j.issn.1002-6819.202510004

基于改进DeepSeek的异步成熟设施番茄采摘决策方法

Harvesting decision-making on the asynchronously ripened fruits of facility-grown tomatoes using improved DeepSeek

袁帅 ¹江丹 ¹饶元 ¹王坦 ¹徐子悦 ¹吴康磊¹

作者信息

1. 安徽农业大学信息与人工智能学院,合肥 230036||农业农村部农业传感器重点实验室,合肥 230036
折叠

摘要

Abstract

An accurate planning of the harvesting priority is often required for the asynchronous maturity in the large-scale facility-grown tomatoes.However,the asynchronously ripening has posed a great challenge on the harvest scheduling and fruit quality.In this study,a harvesting decision-making framework(DeepSeek-VKQ)was proposed for the asynchronously ripened fruits of the facility-grown tomato using DeepSeek-7B.Enhanced visual perception,structured knowledge retrieval,and logical reasoning were synergistically combined in the framework.Firstly,a comprehensive corpus was constructed using the basic knowledge of the tomato cultivation,ripening stage,and pest/disease strategies.Multi-source data was also integrated from the technical manuals,academic literature,web resources,and expert experience.The 7,000 structured Q&A pairs were developed for the knowledge reasoning.Architectural refinements were made on the YOLOv11n backbone network,in order to enhance the visual feature extraction.Global attention mechanism(GAM)was effectively suppressed the foliar background interference.While the conventional CIoU loss was replaced with the spatial SIoU loss,in order to significantly enhance the bounding box regression accuracy under occlusion scenarios.Structured transformation was involved the spatial discretization of the fruit coordinates,and the fusion of the detection confidence with the maturity levels into a continuous ripeness index ranging from 0 to 1.0.Non-linear weight function was used to modulate the probability flow between adjacent maturity levels.Discrete detection outputs were transformed into the continuous index,where the confidence scores were integrated with the maturity labels to indicate the gradual ripening.Effective mapping was realized from the detection to decision semantics.Knowledge reasoning was relied mainly on the dynamic knowledge base.Multi-source textual knowledge was mapped into a low-dimensional semantic space.As such,the feature vectors were generated using the BGE-M3 semantic embedding model.Vector database was also employed to store these vectors.Efficient retrieval was then and facilitated to link the vector indexes into their original knowledge.The key environmental parameters were real-time acquired using API interfaces of meteorological platforms.Precise semantic matching was provided for the chain of thought(CoT)decomposition-guided reasoning.Ultimately,there was the deep integration between large language model(LLM)and dynamically updated agronomic-meteorological knowledge.Experiments were also validated on the annotated tomato images.It was found that the visual extractor was achieved 87.6％mean average precision(mAP)at 0.5-0.95 IoU thresholds,indicating 2.5,3.2,and 2.9 percentage points over YOLOv12n,YOLOv13n,and RT-DETRv2,respectively,with the inference time of 10.2 ms.The exceptional performance of the framework was achieved in the decision-making tasks during tomato harvesting.The better performance was obtained with the precision,recall,and F1-score of 88.4％,91.7％,and 90.0％,respectively.Compared with the original DeepSeek-7B model,these metrics were significantly improved by 21.0,18.0 and 19.6 percentage points,respectively.Ablation experiments showed that there were the F1-score contributions of 7.8 percentage points from the vision module,6.6 percentage points from the knowledge retrieval,and 3.6 percentage points from the CoT decomposition.Their contributions were accounted for the overall performance.Compared with the benchmarks,the 7B-parameter DeepSeek-VKQ substantially outperformed several larger open-source multimodal models,thereby exceeding GLM-4V-9B,InternLM3(20B),Qwen2.5-VL(72B),and DeepSeek-VL2(27.5B)by 16.1,17.2,10.8,and 12.6 percentage points in the F1-score,respectively.Notably,the performance even approached that of the leading closed-source multimodal models,with an F1-score of 90.0％trailing the 90.9％of GPT-4o by a marginal 0.9 percentage points,a recall rate of 91.7％surpassing GPT-4o's 90.0％,and a precision of 88.4％in a narrow gap with GPT-4o's 91.8％.Importantly,all of these were achieved with a fraction of the parameter scale.Task-specific evaluations were maintained the hallucination rates below 6.5％over all subtasks.Furthermore,the third-party large language model was evaluated the framework performance over diverse tasks.Thereby the reliability was enhanced in the practical applications.Cross-modal perception,knowledge retrieval,and logical reasoning were integrated to enhance the framework for the high precision of the tomato harvest decision-making.The finding can also provide the effective technical support for the robotic harvesting decision-making in the facility-grown tomato cultivation.

关键词

设施/番茄/异步成熟/大语言模型/多模态/知识库/思维链/采摘优先级

Key words

facility/tomato/asynchronous ripening/large language model/multimodal/knowledge base/chain of thought/harvesting priority

分类

农业科技

引用本文复制引用

袁帅,江丹,饶元,王坦,徐子悦,吴康磊..基于改进DeepSeek的异步成熟设施番茄采摘决策方法[J].农业工程学报,2026,42(3):56-65,10.

基金项目

国家自然科学基金项目(32371993) （32371993）

安徽省重点研究与开发计划项目(2023n06020057) （2023n06020057）

农业工程学报

ISSN：1002-6819

访问量0

下载量0

段落导航