计算机技术与发展2012,Vol.22Issue(5):87-89,93,4.
基于Web内容的一种数据自动抽取方法
A Data Automatic Extraction Method Based on Web Content
摘要
Abstract
The rapid development of the Web makes it become increasingly an important source of data that people find useful data,current Web sites present information on various topics in various formats and structures. The page organization structure of Web content makes it difficult to use the method of system to effectively extract target data. It uses the Asp. Net technology to develop a data automatic extraction method based on Web content. First it selects target data sources,then it invokes automatically data source and obtains static html document content,generates description file of webpage in accordance with fixed rules,analyzes html document,sets a goal anchor,finally it uses regular expressions and c # technology to automatically extract target data and generate required Web page. This data automatic extraction method can make Web user quickly get the required data information.关键词
Web抽取/html/锚/变换/Asp. NetKey words
Web extraction,-btrril/anchor/transform/Asp. Net分类
信息技术与安全科学引用本文复制引用
朱永生,王军..基于Web内容的一种数据自动抽取方法[J].计算机技术与发展,2012,22(5):87-89,93,4.基金项目
江苏省公益性行业科研专项(GYHY201106037) (GYHY201106037)