计算机与数字工程2024,Vol.52Issue(3):795-801,851,8.DOI:10.3969/j.issn.1672-9722.2024.03.028
基于分层深度语义的科研项目文本相似度度量方法
Text Similarity Measurement Method of Scientific Research Projects Based on Hierarchical Depth Semantics
摘要
Abstract
The article check of research projects is a very important issue in the academic field,and text similarity measure-ment is a key step in the article check.The current text similarity measurement methods of research projects are mainly based on string comparison or the TF-IDF method,which do not take into account the semantic features of the text.This manuscript proposes a hierarchical semantic similarity measurement method for the article of electric power technology projects.This method uses the pre-model RoBERTa-WWM and Whitening to extract the semantic features of sentences,and establishes the hierarchical deep se-mantic similarity of the item texts through cosine similarity.Three levels of hierarchical semantic similarity include similarity be-tween sentences,similarity between chapters,and similarity between articles.This paper shows the effectiveness of the Whitening method on the AFQMC data set,and verifies that our method is superior to the similarity based on string distance and TF-IDF on 50 power technology project articles and corresponding translated articles.关键词
文本相似度/自然语言处理/科研项目查重Key words
text similarity/natural language processing/scientific research projects分类
信息技术与安全科学引用本文复制引用
杨政,方正云,李天骄,李丽敏..基于分层深度语义的科研项目文本相似度度量方法[J].计算机与数字工程,2024,52(3):795-801,851,8.基金项目
国家自然科学基金面上项目(编号:61976173)资助. (编号:61976173)