情报杂志Issue(9):204-207,4.
一种基于同义词扩展的不平衡文本分类方法*
An Imbalanced Text Classification Method Based on Synonyms Expansion
摘要
Abstract
The performance of traditional text categorization methods, especially the categorization performance for minority classes, often deteriorates rapidly for imbalanced text. A new method based on synonyms expansion is introduced in this paper in order to deal with im-balanced text classification. With the steps of the establishment of synonym-dictionary, the determination of expansion rules and the modi-fication of the"Feature-Maintaining Factor", feature items of minority classes are enriched. At the same time, the changes brought by the expansion are compensated. The experimental results show the categorization performance for minority classes is improved to a high de-gree. Moreover, with the decrease of the quantity of text in minority classes, the performance improves significantly. The overall perform-ance is improved to some degree at the same time.关键词
文本分类/不平衡数据集/同义词词典/词频保持Key words
text classification/imbalanced dataset/synonym-dictionary/term-frequency maintaining分类
信息技术与安全科学引用本文复制引用
杨鸿骏,周亚建,郭玉翠..一种基于同义词扩展的不平衡文本分类方法*[J].情报杂志,2013,(9):204-207,4.基金项目
国家自然科学基金项目“基于行为分析的网络流量检测技术研究”(编号:60972077)的资助。 (编号:60972077)