| 注册
首页|期刊导航|计算机与数字工程|基于全文检索的文本相似度算法应用研究

基于全文检索的文本相似度算法应用研究

王格 吴钊 李向

计算机与数字工程2016,Vol.44Issue(4):567-571,614,6.
计算机与数字工程2016,Vol.44Issue(4):567-571,614,6.DOI:10.3969/j.issn.1672-9722.2016.04.001

基于全文检索的文本相似度算法应用研究

Application of Text Similar Algorithm Based on Full-text Retrieval

王格 1吴钊 2李向1

作者信息

  • 1. 湖北文理学院数学与计算机科学学院 襄阳 441053
  • 2. 中国地质大学 武汉 计算机学院 武汉 430074
  • 折叠

摘要

Abstract

In a large number of text data ,due to the problem that it can't quickly and efficiently find useful information and knowledge ,text data mining on the basis of the text similarity calculation becomes an important research topic in the field of data mining .In this paper ,Simhash and VSM cosine algorithm are used to realize the text similarity calculation .First ,the traditional VSM cosine algorithm and Simhash algorithm are adopted to calculate the degree of similarity between the text size n(0 < n< 1) in accordance with the cosine formula through the inner product .Finally in order to achieve the cosine algorithm and improve the efficiency of system ,a large amount of containers are used ,such as Map ,Set ,and the Vector ,as well as the inner product method and so on .The experimental results show that the cosine algorithm VSM due to its limitations are not suitable for text similarity calculation ,and Simhash algorithm has high accuracy and feasibility .

关键词

文本相似度/余弦 VSM/Simhash

Key words

text similarity/cosine VSM/Simhash

分类

信息技术与安全科学

引用本文复制引用

王格,吴钊,李向..基于全文检索的文本相似度算法应用研究[J].计算机与数字工程,2016,44(4):567-571,614,6.

基金项目

国家自然科学基金项目“高可靠服务组合快速优化方法研究”(编号61172084)资助。 ()

计算机与数字工程

OACSTPCD

1672-9722

访问量0
|
下载量0
段落导航相关论文