| 注册
首页|期刊导航|计算机与数字工程|基于分层深度语义的科研项目文本相似度度量方法

基于分层深度语义的科研项目文本相似度度量方法

杨政 方正云 李天骄 李丽敏

计算机与数字工程2024,Vol.52Issue(3):795-801,851,8.
计算机与数字工程2024,Vol.52Issue(3):795-801,851,8.DOI:10.3969/j.issn.1672-9722.2024.03.028

基于分层深度语义的科研项目文本相似度度量方法

Text Similarity Measurement Method of Scientific Research Projects Based on Hierarchical Depth Semantics

杨政 1方正云 2李天骄 3李丽敏3

作者信息

  • 1. 云南电网有限责任公司电力科学研究院信息情报研究所 昆明 650217
  • 2. 云南电网有限责任公司 云南 昆明 650214
  • 3. 西安交通大学数学与统计学院 西安 710049
  • 折叠

摘要

Abstract

The article check of research projects is a very important issue in the academic field,and text similarity measure-ment is a key step in the article check.The current text similarity measurement methods of research projects are mainly based on string comparison or the TF-IDF method,which do not take into account the semantic features of the text.This manuscript proposes a hierarchical semantic similarity measurement method for the article of electric power technology projects.This method uses the pre-model RoBERTa-WWM and Whitening to extract the semantic features of sentences,and establishes the hierarchical deep se-mantic similarity of the item texts through cosine similarity.Three levels of hierarchical semantic similarity include similarity be-tween sentences,similarity between chapters,and similarity between articles.This paper shows the effectiveness of the Whitening method on the AFQMC data set,and verifies that our method is superior to the similarity based on string distance and TF-IDF on 50 power technology project articles and corresponding translated articles.

关键词

文本相似度/自然语言处理/科研项目查重

Key words

text similarity/natural language processing/scientific research projects

分类

信息技术与安全科学

引用本文复制引用

杨政,方正云,李天骄,李丽敏..基于分层深度语义的科研项目文本相似度度量方法[J].计算机与数字工程,2024,52(3):795-801,851,8.

基金项目

国家自然科学基金面上项目(编号:61976173)资助. (编号:61976173)

计算机与数字工程

OACSTPCD

1672-9722

访问量4
|
下载量0
段落导航相关论文