计算机工程2016,Vol.42Issue(11):50-56,7.DOI:10.3969/j.issn.1000-3428.2016.11.009
一种基于本体语义的灾害主题爬虫策略
A Strategy of Disaster Focused Crawler Based on Ontology Semantics
摘要
Abstract
This paper introduces ontology semantics and proposes a new strategy of disaster focused crawler to retrieve disaster theme webpages from the Internet efficiently and accurately.Firstly,the frame and process of disaster focused crawler are designed,and an improved ontology semantic similarity calculation method is proposed.Secondly,the thematic semantic vector is calculated based on semantic similarity,the webpage text feature vector is obtained based on HTML location weighting,and the thematic relevance is calculated.Then a relevance calculation method of URL anchor text is proposed,URL link priority is analyzed,and the crawling queue is optimized.Earthquake disaster and meteorologic disaster are selected to test and analyze,and the experimental results show that the proposed strategy can improve stability and accuracy.关键词
主题爬虫/本体/语义相似度/向量空间模型/相关度计算/锚文本Key words
focused crawler/ontology/semantic similarity/Vector Space Model (VSM )/relevance calculation/achor text分类
天文与地球科学引用本文复制引用
马雷雷,李宏伟,连世伟,梁汝鹏,陈虎..一种基于本体语义的灾害主题爬虫策略[J].计算机工程,2016,42(11):50-56,7.基金项目
国家自然科学基金(41271392,41401463,41571394);四川省应急测绘与防灾减灾工程技术研究中心开放基金(K2015B014)。 ()