计算机应用研究2016,Vol.33Issue(3):665-668,4.DOI:10.3969/j.issn.1001-3695.2016.03.006
一种基于熵的文本相似性计算方法
Text-similarity method based on entropy
摘要
Abstract
Text comparison is the process to find similarity between the two texts,the higher similarity between the texts show the two texts tend to like.The traditional method was from the perspective of the similarity measure characters of the text,ig-nored the text similarity factor of the plural common text string within the text.To address this problem,this paper proposed a text-similarity method based on entropy.The method tried to extract common strings from texts,then established a common sub-measure dimensions,and calculated the similarity based on entropy.Experiments show that the method has a smoother similarity curve,so the algorithm is effective and accuracy.关键词
文本相似性/字符串匹配/编辑距离算法/最长公共子序列Key words
text similarity/string match/Levenshtein distance algorithm/longest common sequence分类
信息技术与安全科学引用本文复制引用
李圣文,凌微,龚君芳,周长征..一种基于熵的文本相似性计算方法[J].计算机应用研究,2016,33(3):665-668,4.基金项目
国家自然科学基金资助项目(61272470);中国地质大学(武汉)中央高校基本科研业务费专项资金资助项目 ()