智慧农业(中英文)2025,Vol.7Issue(1):44-56,13.DOI:10.12133/j.smartag.SA202410022
基于双维信息与剪枝的中文猕猴桃文本命名实体识别方法
Chinese Kiwifruit Text Named Entity Recognition Method Based on Dual-Dimensional Information and Pruning
摘要
Abstract
[Objective]Chinese kiwifruit texts exhibit unique dual-dimensional characteristics.The cross-paragraph dependency is complex se-mantic structure,whitch makes it challenging to capture the full contextual relationships of entities within a single paragraph,necessi-tating models capable of robust cross-paragraph semantic extraction to comprehend entity linkages at a global level.However,most existing models rely heavily on local contextual information and struggle to process long-distance dependencies,thereby reducing rec-ognition accuracy.Furthermore,Chinese kiwifruit texts often contain highly nested entities.This nesting and combination increase the complexity of grammatical and semantic relationships,making entity recognition more difficult.To address these challenges,a novel named entity recognition(NER)method,KIWI-Coord-Prune(kiwifruit-CoordKIWINER-PruneBi-LSTM)was proposed in this re-search,which incorporated dual-dimensional information processing and pruning techniques to improve recognition accuracy. [Methods]The proposed KIWI-Coord-Prune model consisted of a character embedding layer,a CoordKIWINER layer,a PruneBi-LSTM layer,a self-attention mechanism,and a CRF decoding layer,enabling effective entity recognition after processing input charac-ter vectors.The CoordKIWINER and PruneBi-LSTM modules were specifically designed to handle the dual-dimensional features in Chinese kiwifruit texts.The CoordKIWINER module applied adaptive average pooling in two directions on the input feature maps and utilized convolution operations to separate the extracted features into vertical and horizontal branches.The horizontal and vertical features were then independently extracted using the Criss-Cross Attention(CCNet)mechanism and Coordinate Attention(CoordAtt)mechanism,respectively.This module significantly enhanced the model's ability to capture cross-paragraph relationships and nested entity structures,thereby generating enriched character vectors containing more contextual information,which improved the overall representation capability and robustness of the model.The PruneBi-LSTM module was built upon the enhanced dual-dimensional vec-tor representations and introduced a pruning strategy into Bi-LSTM to effectively reduce redundant parameters associated with back-ground descriptions and irrelevant terms.This pruning mechanism not only enhanced computational efficiency while maintaining the dynamic sequence modeling capability of Bi-LSTM but also improved inference speed.Additionally,a dynamic feature extraction strategy was employed to reduce the computational complexity of vector sequences and further strengthen the learning capacity for key features,leading to improved recognition of complex entities in kiwifruit texts.Furthermore,the pruned weight matrices become sparser,significantly reducing memory consumption.This made the model more efficient in handling large-scale agricultural text-pro-cessing tasks,minimizing redundant information while achieving higher inference and training efficiency with fewer computational re-sources. [Results and Discussions]Experiments were conducted on the self-built KIWIPRO dataset and four public datasets:People's Daily,Clu-eNER,Boson,and ResumeNER.The proposed model was compared with five advanced NER models:LSTM,Bi-LSTM,LR-CNN,Softlexicon-LSTM,and KIWINER.The experimental results showed that KIWI-Coord-Prune achieved F1-Scores of 89.55%,91.02%,83.50%,83.49%,and 95.81%,respectively,outperforming all baseline models.Furthermore,controlled variable experiments were conducted to compare and ablate the CoordKIWINER and PruneBi-LSTM modules across the five datasets,confirming their effective-ness and necessity.Additionally,the impact of different design choices was explored for the CoordKIWINER module,including direct fusion,optimized attention mechanism fusion,and network structure adjustment residual optimization.The experimental results dem-onstrated that the optimized attention mechanism fusion method yielded the best performance,which was ultimately adopted in the fi-nal model.These findings highlight the significance of properly designing attention mechanisms to extract dual-dimensional features for NER tasks.Compared to existing methods,the KIWI-Coord-Prune model effectively addressed the issue of underutilized dual-di-mensional information in Chinese kiwifruit texts.It significantly improved entity recognition performance for both overall text struc-tures and individual entity categories.Furthermore,the model exhibited a degree of generalization capability,making it applicable to downstream tasks such as knowledge graph construction and question-answering systems. [Conclusions]This study presents an novel NER approach for Chinese kiwifruit texts,which integrating dual-dimensional information extraction and pruning techniques to overcome challenges related to cross-paragraph dependencies and nested entity structures.The findings offer valuable insights for researchers working on domain-specific NER and contribute to the advancement of agriculture-fo-cused natural language processing applications.However,two key limitations remain:1)The balance between domain-specific optimi-zation and cross-domain generalization requires further investigation,as the model's adaptability to non-agricultural texts has yet to be empirically validated;2)the multilingual applicability of the model is currently limited,necessitating further expansion to accom-modate multilingual scenarios.Future research should focus on two key directions:1)Enhancing domain robustness and cross-lingual adaptability by incorporating diverse textual datasets and leveraging pre-trained multilingual models to improve generalization,and 2)Validating the model's performance in multilingual environments through transfer learning while refining linguistic adaptation strat-egies to further optimize recognition accuracy.关键词
中文命名实体识别/猕猴桃文本/自建数据集/多维度注意力机制/剪枝/深度学习/文本特征增强Key words
Chinese named entity recognition/kiwifruit texts/custom-built dataset/multi-dimensional attention mechanism/pruning/deep learning/text feature enhancement分类
信息技术与安全科学引用本文复制引用
齐梓均,牛当当,吴华瑞,张礼麟,王仑峰,张宏鸣..基于双维信息与剪枝的中文猕猴桃文本命名实体识别方法[J].智慧农业(中英文),2025,7(1):44-56,13.基金项目
陕西省秦创原"科学家+工程师"队伍建设项目(2022KXJ-67) (2022KXJ-67)
国家自然科学基金项目(62206222) Shaanxi Province Qin Chuang Yuan"Scientist+Engineer"Team Building Project(2022KXJ-67) (62206222)
National Natural Science Foundation of China(62206222) (62206222)