首页|期刊导航|计算机应用与软件|一种基于SVM和AdaBoost的Web实体信息抽取方法

一种基于SVM和AdaBoost的Web实体信息抽取方法

孙明陆春生徐秀星李庆忠彭朝晖

计算机应用与软件2013，Vol.30Issue(4)：101-106,152,7.

计算机应用与软件2013，Vol.30Issue(4)：101-106,152,7.DOI:10.3969/j.issn.1000-386x.2013.04.028

一种基于SVM和AdaBoost的Web实体信息抽取方法

A WEB ENTITY INFORMATION EXTRACTION METHOD BASED ON SVM AND ADABOOST

孙明 ¹陆春生 ¹徐秀星 ¹李庆忠 ¹彭朝晖¹

作者信息

折叠

摘要

Abstract

In this paper, a Web entity information extraction method based on SVM and AdaBoost is proposed. Firstly, an identification method for Web page' s main data region based on SVM is proposed, which segments Web page data region effectively based on the display characteristics of Web entity instances in the page, identifies the main data area where the Web entity instances locates. Secondly, based on the characteristics of the Web entity attribute labels, a method based on AdaBoost ensemble learning is proposed, which automatically extracts the Web entities information from the main data area of the page. A variety of experiments are conducted on two real data sets, and the comparison is done with correlated research works as well, experimental results show that this method is able to achieve fairly good extraction effect.

关键词

Web信息抽取/页面分割/集成学习

Key words

Web information extraction/ Page segmentation/ Ensemble learning

分类

信息技术与安全科学

引用本文复制引用

孙明,陆春生,徐秀星,李庆忠,彭朝晖..一种基于SVM和AdaBoost的Web实体信息抽取方法[J].计算机应用与软件,2013,30(4):101-106,152,7.

基金项目

国家科技支撑计划项目(2008BAH32B01). （2008BAH32B01）

计算机应用与软件

OA北大核心CSCDCSTPCD

ISSN：1000-386X

访问量0

下载量0

段落导航