井冈山大学学报(自然科学版)Issue(3):41-44,4.DOI:10.3969/j.issn.1674-8085.2013.03.010
基于词频和文本类别的互信息改进算法
AN IMPROVED MUTUAL INFORMATION ALGORITHM BASED ON WORD FREQUENCY AND TEXT CATEGORY
摘要
Abstract
This paper analyzes the shortages of Mutual Information (MI) algorithm. Aiming at the problem that low frequency features may have higher weights, we take advantage of two indexes of strong informational features–word frequency and concentration ratio and propose an improved MI algorithm based on word frequency and text category (MIFC). The result of the experiment shows that MIFC algorithm has greater accuracy than traditional MI algorithm.关键词
互信息/特征选择/词频/文本类别/MIFCKey words
mutual information/feature selection/word frequency/text category/MIFC分类
信息技术与安全科学引用本文复制引用
谢力,李光耀,谭云兰..基于词频和文本类别的互信息改进算法[J].井冈山大学学报(自然科学版),2013,(3):41-44,4.基金项目
上海市科委国际合作基金项目(10510712500) (10510712500)