| 注册
首页|期刊导航|计算机工程|基于信息增益的文本特征权重改进算法

基于信息增益的文本特征权重改进算法

李凯齐 刁兴春 曹建军

计算机工程2011,Vol.37Issue(1):16-18,21,4.
计算机工程2011,Vol.37Issue(1):16-18,21,4.DOI:10.3969/j.issn.1000-3428.2011.01.006

基于信息增益的文本特征权重改进算法

Improved Algorithm of Text Feature Weighting Based on Information Gain

李凯齐 1刁兴春 2曹建军2

作者信息

  • 1. 解放军理工大学指挥自动化学院,南京,210007
  • 2. 总参第六十三研究所,南京,210007
  • 折叠

摘要

Abstract

The idf function of traditional tf.idf algorithm can only evaluate the ability of features to discriminate different documents in a macroscopically way, which can not reflect the differences of distribution proportion for features in each document and each class of the whole training set, it reduces the accuracy of text representation. To solve the above problem, this paper proposes an improved feature weighting method called tf.igt.igc. This method begins frotu analyzing the characteristics of feature distribution, through introducing the concept of information gain in the information theory, realizes the comprehensive consideration of the two specific dimensions of feature distributions, and overcomes the shortcomings of the traditional formula. Experimental results on the two open source corpus show that compared to other two feature weighting methods, tf.igt.igc is more effective in terms of calculating the feature weighting.

关键词

特征分布/特征加权/文本分类

Key words

feature distribution/ feature weighting/ text classification

分类

信息技术与安全科学

引用本文复制引用

李凯齐,刁兴春,曹建军..基于信息增益的文本特征权重改进算法[J].计算机工程,2011,37(1):16-18,21,4.

基金项目

中国博士后科学基金资助项目(20090461425) (20090461425)

江苏省博士后科研计划基金资助项目(0901014B) (0901014B)

计算机工程

OACSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文