计算机工程2012,Vol.38Issue(8):37-40,4.DOI:10.3969/j.issn.1000-3428.2012.08.013
基于信息增益与信息熵的TFIDF算法
TFIDF Algorithm Based on Information Gain and Information Entropy
摘要
Abstract
The classical Term Frequency and Inverse Documentation Frequency(TFIDF) algorithm neglects the proportion of distribution of terms in categories and between categories of the text collection. Aiming at this problem, this paper introduces the information entropy, and the TFIDF algorithm based on information gain(TFIDFIG) is improved. It proposes a TFIDF algorithm based on information gain and information entropy (TFIDFIGE). Experimental results show that the TFIDFIGE algorithm is more effective than the traditional algorithm, namely TFIDF, TF1DFIG, in terms of precision and recall.关键词
文本分类/信息增益/信息熵/TFIDF算法Key words
text classification/information gain/information entropy/Term Frequency and Inverse Documentation Frequency(TFIDF)分类
信息技术与安全科学引用本文复制引用
李学明,李海瑞,薛亮,何光军..基于信息增益与信息熵的TFIDF算法[J].计算机工程,2012,38(8):37-40,4.基金项目
中央高校基本科研业务费专项基金资助项目(CDJXS 11180009) (CDJXS 11180009)