计算机应用研究2018,Vol.35Issue(2):391-395,5.DOI:10.3969/j.issn.1001-3695.2018.02.016
基于词汇语义信息的文本相似度计算
Text similarity computing based on lexical semantic information
谷重阳 1徐浩煜 2周晗 2张俊杰2
作者信息
- 1. 上海大学通信与信息工程学院,上海200444
- 2. 中国科学院上海高等研究院新媒体无线技术研究中心,上海200120
- 折叠
摘要
Abstract
Traditional text similarity computation usually bases on word matching,which ignores the semantic information of the words,and the calculation results are limited by the repetition rate of the two texts.The distributed word vectors can effectively express semantic relations between words,but the text processing method based on word vector mostly express text by vocabulary series.In order to solve these problems,this paper proposed a new method to calculate the similarity of text.The method considered that there were correlations among the elements of the text vector.The correlations could be expressed by the semantic similarity of words.Therefore,the word similarity was used to improved cosine formula.It compared this method with other three methods on three popular datasets.The experimental results show that the proposed method outperforms other methods in F1 value and accuracy evaluation criteria.关键词
文本相似度/词向量/词频—逆文档频率Key words
text similarity/word embedding/TF-IDF分类
信息技术与安全科学引用本文复制引用
谷重阳,徐浩煜,周晗,张俊杰..基于词汇语义信息的文本相似度计算[J].计算机应用研究,2018,35(2):391-395,5.