计算机与数字工程2016,Vol.44Issue(4):567-571,614,6.DOI:10.3969/j.issn.1672-9722.2016.04.001
基于全文检索的文本相似度算法应用研究
Application of Text Similar Algorithm Based on Full-text Retrieval
摘要
Abstract
In a large number of text data ,due to the problem that it can't quickly and efficiently find useful information and knowledge ,text data mining on the basis of the text similarity calculation becomes an important research topic in the field of data mining .In this paper ,Simhash and VSM cosine algorithm are used to realize the text similarity calculation .First ,the traditional VSM cosine algorithm and Simhash algorithm are adopted to calculate the degree of similarity between the text size n(0 < n< 1) in accordance with the cosine formula through the inner product .Finally in order to achieve the cosine algorithm and improve the efficiency of system ,a large amount of containers are used ,such as Map ,Set ,and the Vector ,as well as the inner product method and so on .The experimental results show that the cosine algorithm VSM due to its limitations are not suitable for text similarity calculation ,and Simhash algorithm has high accuracy and feasibility .关键词
文本相似度/余弦 VSM/SimhashKey words
text similarity/cosine VSM/Simhash分类
信息技术与安全科学引用本文复制引用
王格,吴钊,李向..基于全文检索的文本相似度算法应用研究[J].计算机与数字工程,2016,44(4):567-571,614,6.基金项目
国家自然科学基金项目“高可靠服务组合快速优化方法研究”(编号61172084)资助。 ()