首页|期刊导航|通信学报|基于MapReduce虚拟机的Deep Web数据源发现方法

基于MapReduce虚拟机的Deep Web数据源发现方法

辛洁崔志明赵朋朋张广铭鲜学丰

通信学报2011，Vol.32Issue(7)：189-195,7.

基于MapReduce虚拟机的Deep Web数据源发现方法

Applying MapReduce frameworks to a virtualization platform for Deep Web data source discovery

辛洁 ¹崔志明 ¹赵朋朋 ¹张广铭 ¹鲜学丰¹

作者信息

1. 苏州大学智能信息处理及应用研究所,江苏苏州215006
折叠

摘要

Abstract

In order to improve the performance of Deep Web crawler in discovering and searching data sources interfaces, a new method was raised to parallel processing the mass data within the Deep Web compromising MapReduce programming model and virtualization technology. The new crawling architecture was designed with three producers, the link classified MapReduce, the page classified MapReduce and the form classified MapReduce. Server virtualization was adopted to simulate the cluster environment in order to test the performance. Experiment results indicate that this method is capable for large-scale data parallel computing, can improve the crawling efficiency and avoid wasteful expenditure, which prove the feasibility of applying cloudy technologies into Deep Web data mining field.

关键词

数据源发现/MapReduce/Deep Web/虚拟化技术/云计算

Key words

data source discovery/ MapReduce/ Deep Web/ virtualization technology/ cloudy computing

分类

信息技术与安全科学

引用本文复制引用

辛洁,崔志明,赵朋朋,张广铭,鲜学丰..基于MapReduce虚拟机的Deep Web数据源发现方法[J].通信学报,2011,32(7):189-195,7.

基金项目

国家自然科学基金资助项目(60970015,61003054) （60970015,61003054）

江苏省企业博士创新项目(BK2009563) （BK2009563）

江苏省高校自然科学研究项目(10KJB520018):苏州市科技型企业技术创新资金专项(SG201043) （10KJB520018）

江苏省2010年度普通高校研究生科研创新计划基金资助项目(CX10B_041Z) （CX10B_041Z）

江苏省普通高等学校科研成果产业化推进基金资助项目(JH09-46) （JH09-46）

通信学报

OA北大核心CSCDCSTPCD

ISSN：1000-436X

访问量0

下载量0

段落导航