首页|期刊导航|计算机应用研究|一种基于熵的文本相似性计算方法

一种基于熵的文本相似性计算方法

李圣文凌微龚君芳周长征

计算机应用研究2016，Vol.33Issue(3)：665-668,4.

计算机应用研究2016，Vol.33Issue(3)：665-668,4.DOI:10.3969/j.issn.1001-3695.2016.03.006

一种基于熵的文本相似性计算方法

Text-similarity method based on entropy

李圣文 ¹凌微 ¹龚君芳 ¹周长征²

作者信息

1. 中国地质大学信息工程学院，武汉 430074
2. 国网十堰供电公司，湖北十堰 442000
折叠

摘要

Abstract

Text comparison is the process to find similarity between the two texts,the higher similarity between the texts show the two texts tend to like.The traditional method was from the perspective of the similarity measure characters of the text,ig-nored the text similarity factor of the plural common text string within the text.To address this problem,this paper proposed a text-similarity method based on entropy.The method tried to extract common strings from texts,then established a common sub-measure dimensions,and calculated the similarity based on entropy.Experiments show that the method has a smoother similarity curve,so the algorithm is effective and accuracy.

关键词

文本相似性/字符串匹配/编辑距离算法/最长公共子序列

Key words

text similarity/string match/Levenshtein distance algorithm/longest common sequence

分类

信息技术与安全科学

引用本文复制引用

李圣文,凌微,龚君芳,周长征..一种基于熵的文本相似性计算方法[J].计算机应用研究,2016,33(3):665-668,4.

基金项目

国家自然科学基金资助项目（61272470）；中国地质大学（武汉）中央高校基本科研业务费专项资金资助项目（）

计算机应用研究

OA北大核心CSCDCSTPCD

ISSN：1001-3695

访问量4

下载量0

段落导航