计算机应用与软件2013,Vol.30Issue(8):139-142,4.DOI:10.3969/j.issn.1000-386x.2013.08.037
文本分类中信息增益特征选择算法的改进
IMPROVING THE ALGORITHM OF INFORMATION GAIN FEATURE SELECTION IN TEXT CLASSIFICATION
摘要
Abstract
Feature selection algorithm has great impact on the precision of text classification system.Traditional information gain feature selection algorithm usually leads to some features to be selected which are low-frequency in designated category but high-frequency in other categories.To overcome this shortage,based on in-depth analysis on traditional and related improved algorithms,we introduce the improving thoughts of feature distribution difference factor and the weighted factors of inter-category and intra-category,put forward an improved information gain algorithm based on feature distribution weighting,and experiment it using two kinds of classification algorithms,the naive Bayes classifier and the support vector machine classifier respectively.Experimental results demonstrate that the algorithm proposed in the paper outperforms other improved algorithms.关键词
文本分类/特征选择/信息增益/特征分布加权Key words
Text classification / Feature selection / Information gain / Feature distribution weighting分类
信息技术与安全科学引用本文复制引用
郭颂,马飞..文本分类中信息增益特征选择算法的改进[J].计算机应用与软件,2013,30(8):139-142,4.基金项目
河南省科技厅基础与前沿技术研究计划项目 (122300410281). (122300410281)