摘要
Abstract
This paper studied the text classification accuracy issues. In view of the traditional clustering algorithm in text classification , text classification in the presence of high-dimensional and sparse, especially the synonym and antonym to classification , the classification accuracy is low, in order to solve the above problems, put forward a kind of cluster mean information content text classification algorithm. From the viewpoint of information theory analysis algorithm of text space vector, the text as an information source, information source by getting the various features of the times to accumulate text information, to the field of obvious characteristics of the words and phrases as the clustering objects, then the level of average amount of information for feature extraction. The simulation results show that, the proposed algorithm can effectively extract the text information, effectively improve the classification accuracy, and it has a certain practical value.关键词
文本分类/层次聚类/信息量/仿真Key words
text categorization/ hierarchical clustering, information quantity/ simulation分类
信息技术与安全科学