首页|期刊导航|计算机应用研究|基于改进的聚类平均信息量文本数据挖掘算法研究

基于改进的聚类平均信息量文本数据挖掘算法研究

金菁

计算机应用研究2012，Vol.29Issue(3)：981-983,3.

计算机应用研究2012，Vol.29Issue(3)：981-983,3.DOI:10.3969/j.issn.1001-3695.2012.03.049

基于改进的聚类平均信息量文本数据挖掘算法研究

Cluster based on average amount of information text categorization algorithm

金菁¹

作者信息

1. 北京理工大学软件学院,北京100081
折叠

摘要

Abstract

This paper studied the text classification accuracy issues. In view of the traditional clustering algorithm in text classification , text classification in the presence of high-dimensional and sparse, especially the synonym and antonym to classification , the classification accuracy is low, in order to solve the above problems, put forward a kind of cluster mean information content text classification algorithm. From the viewpoint of information theory analysis algorithm of text space vector, the text as an information source, information source by getting the various features of the times to accumulate text information, to the field of obvious characteristics of the words and phrases as the clustering objects, then the level of average amount of information for feature extraction. The simulation results show that, the proposed algorithm can effectively extract the text information, effectively improve the classification accuracy, and it has a certain practical value.

关键词

文本分类/层次聚类/信息量/仿真

Key words

text categorization/ hierarchical clustering, information quantity/ simulation

分类

信息技术与安全科学

引用本文复制引用

金菁..基于改进的聚类平均信息量文本数据挖掘算法研究[J].计算机应用研究,2012,29(3):981-983,3.

计算机应用研究

OA北大核心CSCDCSTPCD

ISSN：1001-3695

访问量0

下载量0

段落导航