首页|期刊导航|计算机技术与发展|基于Web内容的一种数据自动抽取方法

基于Web内容的一种数据自动抽取方法

朱永生王军

计算机技术与发展2012，Vol.22Issue(5)：87-89,93,4.

基于Web内容的一种数据自动抽取方法

A Data Automatic Extraction Method Based on Web Content

朱永生 ¹王军¹

作者信息

1. 南京信息工程大学网络信息中心,江苏南京210044
折叠

摘要

Abstract

The rapid development of the Web makes it become increasingly an important source of data that people find useful data,current Web sites present information on various topics in various formats and structures. The page organization structure of Web content makes it difficult to use the method of system to effectively extract target data. It uses the Asp. Net technology to develop a data automatic extraction method based on Web content. First it selects target data sources,then it invokes automatically data source and obtains static html document content,generates description file of webpage in accordance with fixed rules,analyzes html document,sets a goal anchor,finally it uses regular expressions and c # technology to automatically extract target data and generate required Web page. This data automatic extraction method can make Web user quickly get the required data information.

关键词

Web抽取/html/锚/变换/Asp. Net

Key words

Web extraction,-btrril/anchor/transform/Asp. Net

分类

信息技术与安全科学

引用本文复制引用

朱永生,王军..基于Web内容的一种数据自动抽取方法[J].计算机技术与发展,2012,22(5):87-89,93,4.

基金项目

江苏省公益性行业科研专项(GYHY201106037) （GYHY201106037）

计算机技术与发展

OACSTPCD

ISSN：1673-629X

访问量0

下载量0

段落导航