计算机应用与软件Issue(3):16-19,30,5.DOI:10.3969/j.issn.1000-386x.2014.03.005
基于模糊SVDD监督的PageRank主题爬虫算法
PAGERANK FOCUSED CRAWLER ALGORITHM BASED ON FUZZY SVDD SUPERVISION
摘要
Abstract
Focused crawler is a web crawler to collect resources from specific fields.In order to ensure focused crawler's precision,the arti-cle proposes a PageRank crawler algorithm based on fuzzy SVDD(support vector domain description)supervision,which not only considers the linking relations among pages,but also uses classifier supervision to prevent crawler from departing from focus.Compared by experiments with keyword matching focused crawler,shark-search focused crawler,PageRank focused crawler,SVMprediction based focused crawler and ordinary SVDD instructed focused crawler,it is validated that the proposed algorithm is more precise.关键词
模糊 SVDD/PageRank/主题爬虫Key words
Fuzzy SVDD/PageRank/Focused crawler分类
信息技术与安全科学引用本文复制引用
汪伟,魏岩,杨煜普..基于模糊SVDD监督的PageRank主题爬虫算法[J].计算机应用与软件,2014,(3):16-19,30,5.基金项目
国家高技术研究发展计划项目(2011AA 040605)。 ()