集成技术Issue(5):5-9,5.
一种基于 LCS 的微博相似页面检测方法
A Method Based on LCS for Detecting Similar Microblog Pages
摘要
Abstract
Microblog is a relation-based platform for sharing, spreading and acquiring information, and also the source of internet public opinion and the important battlefield of information transmission. The convenient forwarding operations of microblog result in the rapid spread of plenty of identical or similar microblog pages in the microblog space. Therefore, the detection of similar microblog pages is of great importance to lighten the client’s burden of browsing and improve the analytic efficiency of internet public opinion. A method based on LCS is introduced to detect similar microblog page: First is to calculate the files’ subset of the possibly similar microblog pages, and the next is to calculate its LCS and extract the reliable parts so as to ultimately detect the similar microblog pages. Experiments show that this method can detect the similar pages from the microblog data accurately and efficiently.关键词
LCS/相似性检测/相似性度量/微博页面Key words
Longest Common Subsequence/near-duplicate detection/similarity measurement/microblog page引用本文复制引用
张宗福..一种基于 LCS 的微博相似页面检测方法[J].集成技术,2013,(5):5-9,5.基金项目
国家自然科学基金项目(项目批准号61272013)和广东省教育科学“十二五”规划2012年度研究项目(项目批准号2010TJK311)。 (项目批准号61272013)