计算机工程Issue(10):309-312,4.DOI:10.3969/j.issn.1000-3428.2013.10.068
基于DOM树和视觉特征的网页信息自动抽取
Web Information Automatic Extraction Based on DOM Tree and Visual Feature
摘要
Abstract
This paper proposes an automatic extraction method based on Document Object Model(DOM) tree and visual features for Web information to extract businesses information in list pages of life information websites. By analyzing and using DOM tree and visual features of data regions in list pages, the method can get the candidate target data regions firstly. The method identifies the target data region by making use of visual features and extracts data records finally. The method tests ten life information websites and achieves 100%recall and 100%precision on eight websites. The results show that the proposed method can get better results.关键词
文档对象模型树/视觉特征/自动抽取/数据记录/数据区域/挖掘算法Key words
Document Object Model(DOM) tree/visual feature/automatic extraction/data recording/data region/mining algorithm分类
信息技术与安全科学引用本文复制引用
黄武冠,朱明,尹文科..基于DOM树和视觉特征的网页信息自动抽取[J].计算机工程,2013,(10):309-312,4.基金项目
国家科技支撑计划基金资助项目(2011BAH11B01);中国科学院重点部署基金资助项目(KGZD-EW-103-(5)) (2011BAH11B01)