| 注册
首页|期刊导航|计算机工程|一种基于本体语义的灾害主题爬虫策略

一种基于本体语义的灾害主题爬虫策略

马雷雷 李宏伟 连世伟 梁汝鹏 陈虎

计算机工程2016,Vol.42Issue(11):50-56,7.
计算机工程2016,Vol.42Issue(11):50-56,7.DOI:10.3969/j.issn.1000-3428.2016.11.009

一种基于本体语义的灾害主题爬虫策略

A Strategy of Disaster Focused Crawler Based on Ontology Semantics

马雷雷 1李宏伟 2连世伟 1梁汝鹏 1陈虎1

作者信息

  • 1. 信息工程大学 地理空间信息学院,郑州 450052
  • 2. 四川省应急测绘与防灾减灾工程技术研究中心,成都 610041
  • 折叠

摘要

Abstract

This paper introduces ontology semantics and proposes a new strategy of disaster focused crawler to retrieve disaster theme webpages from the Internet efficiently and accurately.Firstly,the frame and process of disaster focused crawler are designed,and an improved ontology semantic similarity calculation method is proposed.Secondly,the thematic semantic vector is calculated based on semantic similarity,the webpage text feature vector is obtained based on HTML location weighting,and the thematic relevance is calculated.Then a relevance calculation method of URL anchor text is proposed,URL link priority is analyzed,and the crawling queue is optimized.Earthquake disaster and meteorologic disaster are selected to test and analyze,and the experimental results show that the proposed strategy can improve stability and accuracy.

关键词

主题爬虫/本体/语义相似度/向量空间模型/相关度计算/锚文本

Key words

focused crawler/ontology/semantic similarity/Vector Space Model (VSM )/relevance calculation/achor text

分类

天文与地球科学

引用本文复制引用

马雷雷,李宏伟,连世伟,梁汝鹏,陈虎..一种基于本体语义的灾害主题爬虫策略[J].计算机工程,2016,42(11):50-56,7.

基金项目

国家自然科学基金(41271392,41401463,41571394);四川省应急测绘与防灾减灾工程技术研究中心开放基金(K2015B014)。 ()

计算机工程

OA北大核心CSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文