| 注册
首页|期刊导航|计算机应用与软件|一种改进的集中度和分散度文本特征选择算法

一种改进的集中度和分散度文本特征选择算法

沈友文 赵新建 徐俊

计算机应用与软件2011,Vol.28Issue(9):96-98,125,4.
计算机应用与软件2011,Vol.28Issue(9):96-98,125,4.

一种改进的集中度和分散度文本特征选择算法

AN IMPROVED CONCENTRATION AND DISPERSION TEXT FEATURE SELECTION ALGORITHM

沈友文 1赵新建 1徐俊1

作者信息

  • 1. 浙江工业大学计算机学院 浙江杭州310023
  • 折叠

摘要

Abstract

Feature selection algorithm TFFS has its shortcomings a little bit; it is difficult for the concentration to accurately measure the weight of low frequent terms, while the dispersion ignores the impact of the terms on text classification whose mutual information are negative. In the paper the authors propose a modified feature selection algorithm TFFSL. TFFSL makes certain improvements on concentration and dispersion, avoids the defects of TFFS, and meanwhile by combining the length information of the terms,improves the role of the phrases and word expressions in text classification. Experimental results of SVM classification show that: compared with TFFS algorithm, TFFSL algorithm has better performance in text classification and capacity in eliminating irrelevant terms.

关键词

互信息/特征选择/文本分类/特征权重/支持向量机

Key words

Mutual information Feature selection Text classification Feature weight Support vector machine

分类

信息技术与安全科学

引用本文复制引用

沈友文,赵新建,徐俊..一种改进的集中度和分散度文本特征选择算法[J].计算机应用与软件,2011,28(9):96-98,125,4.

基金项目

浙江省自然科学基金(X105739) (X105739)

计算机应用与软件

OA北大核心CSCDCSTPCD

1000-386X

访问量0
|
下载量0
段落导航相关论文