| 注册
首页|期刊导航|计算机与数字工程|可动态自适应主题爬虫的研究

可动态自适应主题爬虫的研究

肖新凤 余伟 李石君 陈亚辉 刘倍雄 刘永明

计算机与数字工程2019,Vol.47Issue(5):1151-1159,9.
计算机与数字工程2019,Vol.47Issue(5):1151-1159,9.DOI:10.3969/j.issn.1672-9722.2019.05.027

可动态自适应主题爬虫的研究

Research and Implementation of Dynamic Adaptive Topical Crawler

肖新凤 1余伟 2李石君 2陈亚辉 2刘倍雄 1刘永明1

作者信息

  • 1. 广东环境保护工程职业学院 佛山 528216
  • 2. 武汉大学 武汉 430079
  • 折叠

摘要

Abstract

In the face of a dynamically changing Internet,the traditional topical crawlers have problems such as incomplete topical knowledge,domain knowledge updating,topical resource center transfer and so on. In this paper,a topic crawler that can dynamically adapt to Internet information is proposed. In which the TopicHub algorithm can dynamically select seed URLs. Com?pared with the traditional topic crawler of static seed URL,the crawling efficiency increases by more than 7%,and the recall rate in?creases by more than 5% . Additionally,aiming at the problems of the incomplete coverage of the topic information and domain knowledge updating in the static ontology library,an algorithm named SDTP can dynamically expand the domain semantic informa?tion is proposed. Compared with the traditional algorithm which is based on the static ontology library,the precision of the algorithm is improved by 13%,and compared with the algorithm which is based on the VSM,the improvement is 4%.

关键词

主题爬虫/动态自适应/URL图结构

Key words

topic crawler/dynamic self-adaption/URL structure

分类

信息技术与安全科学

引用本文复制引用

肖新凤,余伟,李石君,陈亚辉,刘倍雄,刘永明..可动态自适应主题爬虫的研究[J].计算机与数字工程,2019,47(5):1151-1159,9.

基金项目

国家自然科学基金项目(编号:61502350) (编号:61502350)

2017 广东高校省级重点平台和重大科研项目(编号:2017GKTSCX042)资助. (编号:2017GKTSCX042)

计算机与数字工程

OACSTPCD

1672-9722

访问量2
|
下载量0
段落导航相关论文