| 注册
首页|期刊导航|计算机应用与软件|全局模式下的深网数据抽取与挖掘

全局模式下的深网数据抽取与挖掘

姚晓鹏 高圣兴 薛君志 陆敏超

计算机应用与软件2018,Vol.35Issue(2):91-95,5.
计算机应用与软件2018,Vol.35Issue(2):91-95,5.DOI:10.3969/j.issn.1000-386x.2018.02.016

全局模式下的深网数据抽取与挖掘

DEEP WEB DATA EXTRACTION AND MINING IN GLOBAL MODE

姚晓鹏 1高圣兴 2薛君志 3陆敏超1

作者信息

  • 1. 上海申腾信息技术有限公司 上海 200040
  • 2. 上海市计算技术研究所 上海 200040
  • 3. 浙江工商大学统计与数学学院 浙江杭州 310018
  • 折叠

摘要

Abstract

With the rapid development of modern information, deep web as a network data carrier contains a large amount of data,so it is very important to study the deep web data extraction.In this paper,a method of data extraction and mining in global pattern was proposed.The method analyzed the properties of the actual examples, used the improved Bayesian belief network algorithm, determined the corresponding labels, and constructed a dynamic global pattern.It extracted and identified the data in the result page,detected and removed the useless information based on the density-based outliers.We used the algorithm of mining the frequent itemsets of Boolean association rules to mine the data.Experimental results showed that the proposed method extracted data accurately,quickly and effectively compared with other methods,and through the data mining,the data item had strong relevance and less invalid information.

关键词

深网/全局模式/数据抽取/数据挖掘

Key words

Deep web/Global schema/Data extraction/Data mining

分类

信息技术与安全科学

引用本文复制引用

姚晓鹏,高圣兴,薛君志,陆敏超..全局模式下的深网数据抽取与挖掘[J].计算机应用与软件,2018,35(2):91-95,5.

基金项目

上海市临港地区智能制造产业专项(ZN2016020103). (ZN2016020103)

计算机应用与软件

OA北大核心CSTPCD

1000-386X

访问量0
|
下载量0
段落导航相关论文