计算机应用与软件Issue(12):222-225,287,5.DOI:10.3969/j.issn.1000-386x.2013.12.058
一种新的Web中文文本聚类方法研究
RESEARCH ON A NOVEL WEB CHINESE TEXT CLUSTERING METHOD
叶宇飞 1安世全 2代劲3
作者信息
- 1. 重庆邮电大学计算机科学与技术学院 重庆400065
- 2. 重庆邮电大学移通学院 重庆400065
- 3. 重庆大学计算机科学与技术学院 重庆400065
- 折叠
摘要
Abstract
Traditional text clustering lacks the semantic information , its text eigenvector is high-dimension sparse , and ignores the particularity of the Web text .In order to solve these problems , we propose a Web Chinese text clustering method in this paper .On the basis HowNet-base concept space , the method filters the terms but nouns , analyses the semantics of the important words in the text , and carry out the feature set clustering on label feature set and text feature set .Then it uses the improved TF-IDF algorithm to select features from these two sets, and finally expresses the text as a union of the selected label feature set and text feature set .It reduces the dimensions of features , and expresses the text efficiently .Experimental results demonstrate its effectiveness .关键词
Web文本聚类/特征降维/知网/文本相似度Key words
Web text clustering/Feature dimension reduction/HowNet/Text similarity分类
信息技术与安全科学引用本文复制引用
叶宇飞,安世全,代劲..一种新的Web中文文本聚类方法研究[J].计算机应用与软件,2013,(12):222-225,287,5.