首页|期刊导航|智慧农业（中英文）|基于迁移学习的农业短文本语义相似度计算方法

基于迁移学习的农业短文本语义相似度计算方法

金宁郭宇峰韩晓东缪祎晟吴华瑞

智慧农业（中英文）2025，Vol.7Issue(1)：33-43,11.

智慧农业（中英文）2025，Vol.7Issue(1)：33-43,11.DOI:10.12133/j.smartag.SA202410026

基于迁移学习的农业短文本语义相似度计算方法

Method for Calculating Semantic Similarity of Short Agricultural Texts Based on Transfer Learning

金宁 ¹郭宇峰 ²韩晓东 ¹缪祎晟 ³吴华瑞³

作者信息

1. 沈阳建筑大学计算机科学与工程学院,辽宁沈阳 110168,中国
2. 沈阳建筑大学计算机科学与工程学院,辽宁沈阳 110168,中国||国家农业信息化工程研究中心,北京 100097,中国
3. 国家农业信息化工程研究中心,北京 100097,中国||农业农村部农业信息化技术重点实验室,北京 100097,中国
折叠

摘要

Abstract

[Objective]Intelligent services of agricultural knowledge have emerged as a current hot research domain,serving as a significant sup-port for the construction of smart agriculture.The platform"China Agricultural Technology Extension"provides users with efficient and convenient agricultural information consultation services via mobile terminals,and has accumulated a vast amount of Q&A data.These data are characterized by a huge volume of information,rapid update and iteration,and a high degree of redundancy,resulting in the platform encountering issues such as frequent repetitive questions,low timeliness of problem responses,and inaccurate informa-tion retrieval.There is an urgent requirement for a high-quality text semantic similarity calculation approach to confront these chal-lenges and effectively enhance the information service efficiency and intelligent level of the platform.In view of the problems of in-complete feature extraction and lack of short agro-text annotation data sets in existing text semantic similarity calculation models,a se-mantic similarity calculation model for short agro-text,namely CWPT-SBERT,based on transfer learning and BERT pre-training mod-el,was proposed. [Methods]CWPT-SBERT was based on Siamese architecture with identical left and right sides and shared parameters,which had the advantages of low structural complexity and high training efficiency.This network architecture effectively reduced the consumption of computational resources by sharing parameters and ensures that input texts were compared in the same feature space.CWPT-SBERT consisted of four main parts:Semantic enhancement layer,embedding layer,pooling layer,and similarity measurement layer.The CW-PT method based on the word segmentation unit was proposed in the semantic enhancement layer to further divide Chinese characters into more fine-grained sub-units maximizing the semantic features in short Chinese text and effectively enhancing the model's under-standing of complex Chinese vocabulary and character structures.In the embedding layer,a transfer learning strategy was used to ex-tract features from agricultural short texts based on SBERT.It captured the semantic features of Chinese text in the general domain,and then generated a more suitable semantic feature vector representation after fine-tuning.Transfer learning methods to train models on large-scale general-purposed domain annotation datasets solve the problem of limited short agro-text annotation datasets and high semantic sparsity.The pooling layer used the average pooling strategy to map the high-dimensional semantic vector of Chinese short text to a low-dimensional vector space.The similarity measurement layer used the cosine similarity calculation method to measure the similarity between the semantic feature vector representations of the two output short texts,and the computed similarity degree was finally input into the loss function to guide model training,optimize model parameters,and improve the accuracy of similarity calculation. [Results and Discussions]For the task of calculating semantic similarity in agricultural short texts,on a dataset containing 19 968 pairs of short ago-texts,the CWPT-SBERT model achieved an accuracy rate of 97.18%and 96.93%,a recall rate of 97.14%,and an F1-Score value of 97.04%,which are higher than 12 models such as TextCNN_Attention,MaLSTM and SBERT.By analyzing the Pear-son and Spearman coefficients of CWPT-SBERT,SBERT,SALBERT and SRoBERTa trained on short agro-text datasets,it could be observed that the initial training value of the CWPT-SBERT model was significantly higher than that of the comparison models and was close to the highest value of the comparison models.Moreover,it exhibited a smooth growth trend during the training process,in-dicating that CWPT-SBERT had strong correlation,robustness,and generalization ability from the initial state.During the training pro-cess,it could not only learn the features in the training data but also effectively apply these features to new domain data.Additionally,for ALBERT,RoBERTa and BERT models,fine-tuning training was conducted on short agro-text datasets,and optimization was per-formed by utilizing the morphological structure features to enrich text semantic feature expression.Through ablation experiments,it was evident that both optimization strategies could effectively enhance the performance of the models.By analyzing the attention weight heatmap of Chinese character morphological structure,the importance of Chinese character radicals in representing Chinese character attributes was highlighted,enhancing the semantic representation of Chinese characters in vector space.There was also com-plex correlation within the morphological structure of Chinese characters. [Conclusions]CWPT-SBERT uses transfer learning methods to solve the problem of limited short agro-text annotation datasets and high semantic sparsity.By leveraging the Chinese-oriented word segmentation method CWPT to break down Chinese characters,the semantic representation of word vectors is enhanced,and the semantic feature expression of short texts is enriched.CWPT-SBERT model has high accuracy of semantic similarity on small-scale short agro-text and obvious performance advantages,which provides an effective technical reference for semantic intelligence matching.

关键词

迁移学习/农业短文本/语义相似度计算/字形特征/知识智能服务/大模型

Key words

transfer learning/short agro-text/semantic similarity calculation/glyph features/intelligent knowledge service/big model

分类

信息技术与安全科学

引用本文复制引用

金宁,郭宇峰,韩晓东,缪祎晟,吴华瑞..基于迁移学习的农业短文本语义相似度计算方法[J].智慧农业（中英文）,2025,7(1):33-43,11.

基金项目

国家重点研发计划项目(2024YFD200803-3) （2024YFD200803-3）

辽宁省教育厅基础研究项目(LJKQZ20222458) （LJKQZ20222458）

辽宁省科技计划联合计划(2024-MSLH-399) National Key Research and Development Program of China(2024YFD200803-3) （2024-MSLH-399）

Basic Research Project of Educa-tion Department of Liaoning Province(LJKQZ20222458) （LJKQZ20222458）

Liaoning Province Science and Technology Plan Joint Plan(2024-MSLH-399) （2024-MSLH-399）

智慧农业（中英文）

ISSN：2096-8094

访问量6

下载量0

段落导航