| 注册
首页|期刊导航|计算机工程与科学|结合混合特征提取与深度学习的长文本语义相似度计算

结合混合特征提取与深度学习的长文本语义相似度计算

徐捷 邵玉斌 杜庆治 龙华 马迪南

计算机工程与科学2024,Vol.46Issue(8):1513-1520,8.
计算机工程与科学2024,Vol.46Issue(8):1513-1520,8.DOI:10.3969/j.issn.1007-130X.2024.08.020

结合混合特征提取与深度学习的长文本语义相似度计算

Long text semantic similarity calculation combining hybrid feature extraction and deep learning

徐捷 1邵玉斌 1杜庆治 1龙华 2马迪南3

作者信息

  • 1. 昆明理工大学信息工程与自动化学院,云南 昆明 650504
  • 2. 昆明理工大学信息工程与自动化学院,云南 昆明 650504||云南省媒体融合重点实验室,云南 昆明 650228
  • 3. 云南省媒体融合重点实验室,云南 昆明 650228
  • 折叠

摘要

Abstract

Text semantic similarity calculation is a crucial task in natural language processing,but current research on similarity mostly focuses on short texts rather than long texts.Compared to short texts,long texts are semantically rich but their semantic information tends to be scattered.To address the issue of scattered semantic information in long texts,a feature extraction method is proposed to ex-tract the main semantic information from long texts.The extracted semantic information is then fed into a BERT pre-training model using a sliding window overlap approach to obtain text vector representa-tions.A bidirectional long short-term memory network is then utilized to model the contextual semantic relationships of long texts,mapping them into a semantic space.The model's representation ability is further enhanced through the addition of a linear layer.Finally,finetuning is performed by maximizing the inner product of similar semantic vectors and minimizing the cross-entropy loss function.Experi-ment results show that this method achieves F1 scores of 0.84 and 0.91 on the CNSE and CNSS data-sets,outperforming the baseline models.

关键词

长文本语义相似度/特征提取/BERT预训练模型/语义空间

Key words

long text semantic similarity/feature extraction/BERT pre-training model/semantic space

分类

信息技术与安全科学

引用本文复制引用

徐捷,邵玉斌,杜庆治,龙华,马迪南..结合混合特征提取与深度学习的长文本语义相似度计算[J].计算机工程与科学,2024,46(8):1513-1520,8.

基金项目

云南省融媒体重点实验室项目(220235205) (220235205)

计算机工程与科学

OA北大核心CSTPCD

1007-130X

访问量0
|
下载量0
段落导航相关论文