| 注册
首页|期刊导航|情报杂志|一种基于同义词扩展的不平衡文本分类方法*

一种基于同义词扩展的不平衡文本分类方法*

杨鸿骏 周亚建 郭玉翠

情报杂志Issue(9):204-207,4.
情报杂志Issue(9):204-207,4.

一种基于同义词扩展的不平衡文本分类方法*

An Imbalanced Text Classification Method Based on Synonyms Expansion

杨鸿骏 1周亚建 2郭玉翠1

作者信息

  • 1. 北京邮电大学信息安全中心 北京 100876
  • 2. 灾备技术国家工程实验室 北京 100876
  • 折叠

摘要

Abstract

The performance of traditional text categorization methods, especially the categorization performance for minority classes, often deteriorates rapidly for imbalanced text. A new method based on synonyms expansion is introduced in this paper in order to deal with im-balanced text classification. With the steps of the establishment of synonym-dictionary, the determination of expansion rules and the modi-fication of the"Feature-Maintaining Factor", feature items of minority classes are enriched. At the same time, the changes brought by the expansion are compensated. The experimental results show the categorization performance for minority classes is improved to a high de-gree. Moreover, with the decrease of the quantity of text in minority classes, the performance improves significantly. The overall perform-ance is improved to some degree at the same time.

关键词

文本分类/不平衡数据集/同义词词典/词频保持

Key words

text classification/imbalanced dataset/synonym-dictionary/term-frequency maintaining

分类

信息技术与安全科学

引用本文复制引用

杨鸿骏,周亚建,郭玉翠..一种基于同义词扩展的不平衡文本分类方法*[J].情报杂志,2013,(9):204-207,4.

基金项目

国家自然科学基金项目“基于行为分析的网络流量检测技术研究”(编号:60972077)的资助。 (编号:60972077)

情报杂志

OA北大核心CHSSCDCSSCI

1002-1965

访问量4
|
下载量0
段落导航相关论文