| 注册
首页|期刊导航|计算机技术与发展|用于文本分类的特征项权重算法改进

用于文本分类的特征项权重算法改进

龚静 胡平霞 胡灿

计算机技术与发展Issue(9):128-132,5.
计算机技术与发展Issue(9):128-132,5.DOI:10.3969/j.issn.1673-629X.2014.09.029

用于文本分类的特征项权重算法改进

Improvement of Algorithm for Weight of Characteristic Item in Text Classification

龚静 1胡平霞 1胡灿1

作者信息

  • 1. 湖南环境生物职业技术学院 信息技术系,湖南 衡阳 421005
  • 折叠

摘要

Abstract

TF-IDF algorithm is a commonly used method of calculating weight in text classification,but TF-IDF considers only occurrence of feature in the text,as well as the frequency of characteristic appearing in the training set,and does not take into the distribution of charac-teristics in each class and the semantic information of characteristics account. In order to solve this problem,the improved TF-IDF algo-rithm has been proposed which considers not only the distribution condition of feature in class,but also the semantic factors such as the po-sition of the feature,length of the feature. This algorithm can better reflect the importance of feature item,and its validity is verified by Naïve Bayes classifier. The experiment results show that the proposed algorithm outperforms the TF-IDF algorithm,and the algorithm can improve the accuracy of text classification well.

关键词

文本分类/特征项/权重/改进

Key words

text classification/feature item/weights/improvement

分类

信息技术与安全科学

引用本文复制引用

龚静,胡平霞,胡灿..用于文本分类的特征项权重算法改进[J].计算机技术与发展,2014,(9):128-132,5.

基金项目

湖南省教育科技计划项目(07D036) (07D036)

湖南省教育厅、财政厅联合资助项目(12C1056) (12C1056)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文