首页|期刊导航|计算机应用研究|基于词汇语义信息的文本相似度计算

基于词汇语义信息的文本相似度计算

谷重阳徐浩煜周晗张俊杰

计算机应用研究2018，Vol.35Issue(2)：391-395,5.

计算机应用研究2018，Vol.35Issue(2)：391-395,5.DOI:10.3969/j.issn.1001-3695.2018.02.016

基于词汇语义信息的文本相似度计算

Text similarity computing based on lexical semantic information

谷重阳 ¹徐浩煜 ²周晗 ²张俊杰²

作者信息

1. 上海大学通信与信息工程学院,上海200444
2. 中国科学院上海高等研究院新媒体无线技术研究中心,上海200120
折叠

摘要

Abstract

Traditional text similarity computation usually bases on word matching,which ignores the semantic information of the words,and the calculation results are limited by the repetition rate of the two texts.The distributed word vectors can effectively express semantic relations between words,but the text processing method based on word vector mostly express text by vocabulary series.In order to solve these problems,this paper proposed a new method to calculate the similarity of text.The method considered that there were correlations among the elements of the text vector.The correlations could be expressed by the semantic similarity of words.Therefore,the word similarity was used to improved cosine formula.It compared this method with other three methods on three popular datasets.The experimental results show that the proposed method outperforms other methods in F1 value and accuracy evaluation criteria.

关键词

文本相似度/词向量/词频—逆文档频率

Key words

text similarity/word embedding/TF-IDF

分类

信息技术与安全科学

引用本文复制引用

谷重阳,徐浩煜,周晗,张俊杰..基于词汇语义信息的文本相似度计算[J].计算机应用研究,2018,35(2):391-395,5.

计算机应用研究

OA北大核心CSCDCSTPCD

ISSN：1001-3695

访问量0

下载量0

段落导航