电子科技大学学报Issue(4):599-604,6.DOI:10.3969/j.issn.1001-0548.2015.04.021
基于Single-Pass的网络舆情热点发现算法
An Internet Public Opinion Hotspot Detection Algorithm Based on Single-Pass
摘要
Abstract
By considering the time interval of Internet events as well as the importance of different feature items from semi-structured Web documents in different locations, an improved single-pass text clustering algorithm called single-pass* is proposed. The advantage is that it assigns the weight value to different feature items from different locations on the Web pages, and only needs to calculate the similarity between the new document and its seed document. Experimental results show that, compared to the single-pass algorithm, the improved algorithm can reduce the missing rate, the error detection rate, and the degradation of system performance caused by computing the topic similarity of documents in new Web data stream, and improve the clustering efficiency at an average rate of 40%. The clustered Web texts can be used to analyze the Internet opinion including the topic relevant degree and the hot degree.关键词
舆情分析/single-pass/文本聚类/话题发现Key words
public opinion analysis/single-pass/text clustering/topic detection分类
信息技术与安全科学引用本文复制引用
格桑多吉,乔少杰,韩楠,张小松,杨燕,元昌安,康健..基于Single-Pass的网络舆情热点发现算法[J].电子科技大学学报,2015,(4):599-604,6.基金项目
国家自然科学基金(61100045,61165013);高等学校博士学科点专项科研基金(20110184120008);中国博士后科学基金特别资助项目(201104697);教育部人文社会科学研究青年基金(14YJCZH046);中央高校基本科研业务费专项资金(2682013BR023);科学计算与智能信息处理广西高校重点实验室开放课题资助(GXSCIIP201407);四川省教育厅资助科研项目(14ZB0458) (61100045,61165013)