| 注册
首页|期刊导航|计算机工程与应用|微博文本聚类中特征扩展策略研究

微博文本聚类中特征扩展策略研究

段旭磊 张仰森 郭正斌

计算机工程与应用2017,Vol.53Issue(13):90-94,195,6.
计算机工程与应用2017,Vol.53Issue(13):90-94,195,6.DOI:10.3778/j.issn.1002-8331.1606-0438

微博文本聚类中特征扩展策略研究

Feature extension of cluster analysis based on Microblog

段旭磊 1张仰森 1郭正斌1

作者信息

  • 1. 北京信息科技大学 智能信息处理研究所,北京 100192
  • 折叠

摘要

Abstract

Microblog has become the soil of information generated and spread today. But the information in the Microblog is different from the news Web page or blog information. In the Microblog, these characteristics, which the texts are high-dimensional and sparse, bring great challenges to the Microblog text processing. According to the characteristics of Micro-blog, this paper compares the methods that the expansion strategy of short text based on HowNet and Cilin, it proposes that using Word2vec to train the corpus of Microblog, and constructs a related vocabulary words of the Microblog context, then uses the seed words and Microblog label information to expand Microblog text, and puts forward the methods of extracting Microblog text keywords and distinguishing the similar words and related words. Finally, the experiments show that by using the Word2vec to extend Microblog is better, and the effect of cluster analysis for Microblog text has been significantly improved.

关键词

微博文本/高维稀疏/关键词提取/相似词/相关词/特征扩展/聚类

Key words

Microblog text/high dimension and sparse/keyword extraction/similar words/related words/feature expan-sion/clustering

分类

信息技术与安全科学

引用本文复制引用

段旭磊,张仰森,郭正斌..微博文本聚类中特征扩展策略研究[J].计算机工程与应用,2017,53(13):90-94,195,6.

基金项目

国家自然科学基金(No.61370139) (No.61370139)

北京市属高等学校创新团队建设与教师职业发展计划项目(No.IDHT20130519). (No.IDHT20130519)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文