华南农业大学学报2026,Vol.47Issue(1):86-93,8.DOI:10.7671/j.issn.1001-411X.202507003
基于思维链蒸馏和反事实推理的农业命名实体识别技术
Agricultural named entity recognition technology based on thought chain distillation and counterfactual reasoning
摘要
Abstract
[Objective]To address the issues of hallucinations,contextual logical inconsistencies,and inability to run on low-resource devices when large language models perform named entity recognition(NER)in agriculture.[Method]Using DeepSeek with 671 billion parameters(DeepSeek-671B)as the teacher model,domain knowledge was transferred to student models with fewer parameters.The student models selected were low-parameter versions of DeepSeek,Qwen,and Llama(1.5 billion,7.0 billion,and 14.0 billion parameters,abbreviated as 1.5B,7.0B and 14B respectively),which underwent distillation and counterfactual reasoning training.Model performance was experimentally validated on the CropDiseaseNer dataset,a specialized agricultural disease dataset.[Result]By comparing the performance of a series of distilled student models,the results showed that DeepSeek-14B achieved an entity recognition F1 score of 89.60%while requiring only 2.08%of the parameters of the teacher model.Its performance significantly outperformed both the general-purpose large model GPT-mini-14B(F1 score:57.64%)and the domain-adapted model GLiNER(F1 score:82.96%)based on a general LLM.Further analysis revealed that the DeepSeek student model,sharing the same architecture,demonstrated superiority over models with different architectures in recognizing long-tail categories such as disease entities and pathogen genus names,owing to its parameter alignment advantage.[Conclusion]This study validates the effectiveness of knowledge distillation in NER tasks within the agricultural domain,offering a novel solution for entity recognition technology in resource-constrained scenarios.关键词
农业领域/大型语言模型/知识蒸馏/命名实体识别Key words
Agriculture/Large language model/Knowledge distillation/Named entity recognition分类
农业科技引用本文复制引用
吴泽震,张奕,黄泳彬,兰玉彬,孟祥宝,邓小玲..基于思维链蒸馏和反事实推理的农业命名实体识别技术[J].华南农业大学学报,2026,47(1):86-93,8.基金项目
广东省重点研发计划(2023B0202090001) (2023B0202090001)
国家自然科学基金(32371984) (32371984)
广东省普通高校人工智能重点领域专项(2019KZDZX1012) (2019KZDZX1012)
国家重点研发计划(2023YFD2000200) (2023YFD2000200)
高等学校学科创新引智基地项目(D18019) (D18019)
岭南现代农业科学与技术广东省实验室科研项目(NT2021009) (NT2021009)